[Phpmyadmin-git] [phpmyadmin/localized_docs] 4c1de7: Translated using Weblate (Turkish)

2015-02-17 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/localized_docs
  Commit: 4c1de76c08153d1db8f3132a2c994a27e89a4701
  
https://github.com/phpmyadmin/localized_docs/commit/4c1de76c08153d1db8f3132a2c994a27e89a4701
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2015-02-17 (Tue, 17 Feb 2015)

  Changed paths:
M po/tr.mo
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (1667 of 1667 strings)

[CI skip]


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[jira] [Commented] (SPARK-5811) Documentation for --packages and --repositories on Spark Shell

2015-02-17 Thread Burak Yavuz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323795#comment-14323795
 ] 

Burak Yavuz commented on SPARK-5811:


The documentation is not really blocked, but I want to test what I write in the 
documentation before submitting a PR, and I see the issue SPARK-5857

 Documentation for --packages and --repositories on Spark Shell
 --

 Key: SPARK-5811
 URL: https://issues.apache.org/jira/browse/SPARK-5811
 Project: Spark
  Issue Type: Documentation
  Components: Deploy, Spark Shell
Affects Versions: 1.3.0
Reporter: Burak Yavuz
Priority: Critical
 Fix For: 1.3.0


 Documentation for the new support for dependency management using maven 
 coordinates using --packages and --repositories



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5857) pyspark PYTHONPATH not properly set up?

2015-02-16 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-5857:
--

 Summary: pyspark PYTHONPATH not properly set up?
 Key: SPARK-5857
 URL: https://issues.apache.org/jira/browse/SPARK-5857
 Project: Spark
  Issue Type: Bug
  Components: Deploy, PySpark
Affects Versions: 1.3.0
Reporter: Burak Yavuz
Priority: Blocker


Locally, I run the following command:
```bin/pyspark --py-files 
~/Projects/spark-csv/python/thunder/clustering/kmeans.py```

normally kmeans.py should be under PYTHONPATH, but when I check 
 import os
 
 os.environ['PYTHONPATH']
'~/Documents/spark/python/lib/py4j-0.8.2.1-src.zip:~/Documents/spark/python/:'

it's not there





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5810) Maven Coordinate Inclusion failing in pySpark

2015-02-16 Thread Burak Yavuz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323474#comment-14323474
 ] 

Burak Yavuz commented on SPARK-5810:


Makes sense to add a regression test. I'll add it with the documentation PR 
which I'll submit today. I'll ping you on that one so that you can take a look.

 Maven Coordinate Inclusion failing in pySpark
 -

 Key: SPARK-5810
 URL: https://issues.apache.org/jira/browse/SPARK-5810
 Project: Spark
  Issue Type: Bug
  Components: Deploy, PySpark
Affects Versions: 1.3.0
Reporter: Burak Yavuz
Assignee: Josh Rosen
Priority: Blocker
 Fix For: 1.3.0


 When including maven coordinates to download dependencies in pyspark, pyspark 
 returns a GatewayError, because it cannot read the proper port to communicate 
 with the JVM. This is because pyspark relies on STDIN to read the port number 
 and in the meantime Ivy prints out a whole lot of logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5810) Maven Coordinate Inclusion failing in pySpark

2015-02-13 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-5810:
--

 Summary: Maven Coordinate Inclusion failing in pySpark
 Key: SPARK-5810
 URL: https://issues.apache.org/jira/browse/SPARK-5810
 Project: Spark
  Issue Type: Bug
  Components: Deploy, PySpark
Affects Versions: 1.3.0
Reporter: Burak Yavuz
Priority: Blocker
 Fix For: 1.3.0


When including maven coordinates to download dependencies in pyspark, pyspark 
returns a GatewayError, because it cannot read the proper port to communicate 
with the JVM. This is because pyspark relies on STDIN to read the port number 
and in the meantime Ivy prints out a whole lot of logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5811) Documentation for --packages and --repositories on Spark Shell

2015-02-13 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-5811:
--

 Summary: Documentation for --packages and --repositories on Spark 
Shell
 Key: SPARK-5811
 URL: https://issues.apache.org/jira/browse/SPARK-5811
 Project: Spark
  Issue Type: Documentation
  Components: Deploy, Spark Shell
Affects Versions: 1.3.0
Reporter: Burak Yavuz
Priority: Critical
 Fix For: 1.3.0


Documentation for the new support for dependency management using maven 
coordinates using --packages and --repositories



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 633212: Translated using Weblate (Turkish)

2015-02-12 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: 63321222cd0e528555a6353d2d4e937216ef391c
  
https://github.com/phpmyadmin/phpmyadmin/commit/63321222cd0e528555a6353d2d4e937216ef391c
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2015-02-12 (Thu, 12 Feb 2015)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (3012 of 3012 strings)

[CI skip]


--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 96b710: Translated using Weblate (Turkish)

2015-02-12 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: 96b710be3d018132ab8c2cf9501ccb31d6ad2e68
  
https://github.com/phpmyadmin/phpmyadmin/commit/96b710be3d018132ab8c2cf9501ccb31d6ad2e68
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2015-02-12 (Thu, 12 Feb 2015)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (3012 of 3012 strings)

[CI skip]


--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


Re: generate a random matrix with uniform distribution

2015-02-09 Thread Burak Yavuz
Sorry about that, yes, it should be uniformVectorRDD. Thanks Sean!

Burak

On Mon, Feb 9, 2015 at 2:05 AM, Sean Owen so...@cloudera.com wrote:

 Yes the example given here should have used uniformVectorRDD. Then it's
 correct.

 On Mon, Feb 9, 2015 at 9:56 AM, Luca Puggini lucapug...@gmail.com wrote:
  Thanks a lot!
  Can I ask why this code generates a uniform distribution?
 
  If dist is N(0,1) data should be  N(-1, 2).
 
  Let me know.
  Thanks,
  Luca
 
  2015-02-07 3:00 GMT+00:00 Burak Yavuz brk...@gmail.com:
 
  Hi,
 
  You can do the following:
  ```
  import org.apache.spark.mllib.linalg.distributed.RowMatrix
  import org.apache.spark.mllib.random._
 
  // sc is the spark context, numPartitions is the number of partitions
 you
  want the RDD to be in
  val dist: RDD[Vector] = RandomRDDs.normalVectorRDD(sc, n, k,
  numPartitions, seed)
  // make the distribution uniform between (-1, 1)
  val data = dist.map(_ * 2  - 1)
  val matrix = new RowMatrix(data, n, k)
 
  On Feb 6, 2015 11:18 AM, Donbeo lucapug...@gmail.com wrote:
 
  Hi
  I would like to know how can I generate a random matrix where each
  element
  come from a uniform distribution in -1, 1 .
 
  In particular I would like the matrix be a distributed row matrix with
  dimension n x p
 
  Is this possible with mllib? Should I use another library?
 
 
 
  --
  View this message in context:
 
 http://apache-spark-user-list.1001560.n3.nabble.com/generate-a-random-matrix-with-uniform-distribution-tp21538.html
  Sent from the Apache Spark User List mailing list archive at
 Nabble.com.
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 
 



[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 09539d: Translated using Weblate (Turkish)

2015-02-09 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: 09539d41e6ce4eca62ff02b0ecd47bcbfe3c2fee
  
https://github.com/phpmyadmin/phpmyadmin/commit/09539d41e6ce4eca62ff02b0ecd47bcbfe3c2fee
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2015-02-09 (Mon, 09 Feb 2015)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (3011 of 3011 strings)

[CI skip]


--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[Phpmyadmin-git] [phpmyadmin/phpmyadmin] d53e06: Translated using Weblate (Turkish)

2015-02-08 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: d53e064ee7a01f8768e7453d0eef73b2921b44be
  
https://github.com/phpmyadmin/phpmyadmin/commit/d53e064ee7a01f8768e7453d0eef73b2921b44be
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2015-02-08 (Sun, 08 Feb 2015)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (3009 of 3009 strings)

[CI skip]


--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


Re: matrix of random variables with spark.

2015-02-06 Thread Burak Yavuz
Forgot to add the more recent training material:
https://databricks-training.s3.amazonaws.com/index.html

On Fri, Feb 6, 2015 at 12:12 PM, Burak Yavuz brk...@gmail.com wrote:

 Hi Luca,

 You can tackle this using RowMatrix (spark-shell example):
 ```
 import org.apache.spark.mllib.linalg.distributed.RowMatrix
 import org.apache.spark.mllib.random._

 // sc is the spark context, numPartitions is the number of partitions you
 want the RDD to be in
 val data: RDD[Vector] = RandomRDDs.normalVectorRDD(sc, n, k,
 numPartitions, seed)
 val matrix = new RowMatrix(data, n, k)
 ```

 You can find more tutorials here:
 https://spark-summit.org/2013/exercises/index.html

 Best,
 Burak




 On Fri, Feb 6, 2015 at 10:03 AM, Luca Puggini lucapug...@gmail.com
 wrote:

 Hi all,
 this is my first email with this mailing list and I hope that I am not
 doing anything wrong.

 I am currently trying to define a distributed matrix with n rows and k
 columns where each element is randomly sampled by a uniform distribution.
 How can I do that?

 It would be also nice if you can suggest me any good guide that I can use
 to start working with Spark. (The quick start tutorial is not enough for me
 )

 Thanks a lot !





Re: matrix of random variables with spark.

2015-02-06 Thread Burak Yavuz
Hi Luca,

You can tackle this using RowMatrix (spark-shell example):
```
import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.random._

// sc is the spark context, numPartitions is the number of partitions you
want the RDD to be in
val data: RDD[Vector] = RandomRDDs.normalVectorRDD(sc, n, k, numPartitions,
seed)
val matrix = new RowMatrix(data, n, k)
```

You can find more tutorials here:
https://spark-summit.org/2013/exercises/index.html

Best,
Burak




On Fri, Feb 6, 2015 at 10:03 AM, Luca Puggini lucapug...@gmail.com wrote:

 Hi all,
 this is my first email with this mailing list and I hope that I am not
 doing anything wrong.

 I am currently trying to define a distributed matrix with n rows and k
 columns where each element is randomly sampled by a uniform distribution.
 How can I do that?

 It would be also nice if you can suggest me any good guide that I can use
 to start working with Spark. (The quick start tutorial is not enough for me
 )

 Thanks a lot !



Re: generate a random matrix with uniform distribution

2015-02-06 Thread Burak Yavuz
Hi,

You can do the following:
```
import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.random._

// sc is the spark context, numPartitions is the number of partitions you
want the RDD to be in
val dist: RDD[Vector] = RandomRDDs.normalVectorRDD(sc, n, k, numPartitions,
seed)
// make the distribution uniform between (-1, 1)
val data = dist.map(_ * 2  - 1)
val matrix = new RowMatrix(data, n, k)
On Feb 6, 2015 11:18 AM, Donbeo lucapug...@gmail.com wrote:

 Hi
 I would like to know how can I generate a random matrix where each element
 come from a uniform distribution in -1, 1 .

 In particular I would like the matrix be a distributed row matrix with
 dimension n x p

 Is this possible with mllib? Should I use another library?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/generate-a-random-matrix-with-uniform-distribution-tp21538.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 13d1c0: Translated using Weblate (Turkish)

2015-02-04 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: 13d1c0dacda739d0c6af60097be3788f01ca2964
  
https://github.com/phpmyadmin/phpmyadmin/commit/13d1c0dacda739d0c6af60097be3788f01ca2964
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2015-02-04 (Wed, 04 Feb 2015)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (3009 of 3009 strings)

[CI skip]


--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[Phpmyadmin-git] [phpmyadmin/phpmyadmin] b4d7a5: Translated using Weblate (Turkish)

2015-01-28 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: b4d7a519fa2825bf91611d98c3112679b1b5cba9
  
https://github.com/phpmyadmin/phpmyadmin/commit/b4d7a519fa2825bf91611d98c3112679b1b5cba9
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2015-01-28 (Wed, 28 Jan 2015)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (3005 of 3005 strings)

[CI skip]


--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[jira] [Created] (SPARK-5341) Support maven coordinates in spark-shell and spark-submit

2015-01-20 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-5341:
--

 Summary: Support maven coordinates in spark-shell and spark-submit
 Key: SPARK-5341
 URL: https://issues.apache.org/jira/browse/SPARK-5341
 Project: Spark
  Issue Type: New Feature
  Components: Deploy, Spark Shell
Reporter: Burak Yavuz


This feature will allow users to provide the maven coordinates of jars they 
wish to use in their spark application. Coordinates can be a comma-delimited 
list and be supplied like:
```spark-submit --maven org.apache.example.a,org.apache.example.b```
This feature will also be added to spark-shell (where it is more critical to 
have this feature)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5322) Add transpose() to BlockMatrix

2015-01-19 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-5322:
--

 Summary: Add transpose() to BlockMatrix
 Key: SPARK-5322
 URL: https://issues.apache.org/jira/browse/SPARK-5322
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Burak Yavuz


Once Local matrices have the option to transpose, transposing a BlockMatrix 
will be trivial. Again, this will be a flag, which will in the end affect every 
SubMatrix in the RDD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5321) Add transpose() method to Matrix

2015-01-19 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-5321:
--

 Summary: Add transpose() method to Matrix
 Key: SPARK-5321
 URL: https://issues.apache.org/jira/browse/SPARK-5321
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Burak Yavuz


While we are working on BlockMatrix, it will be nice to add the support to 
transpose matrices. .transpose() will just modify a private flag in local 
matrices. Operations that follow will be performed based on this flag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 22e00e: Translated using Weblate (Turkish)

2015-01-09 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: 22e00e6a3578de1aede0ce06ef9e327c4bbe3f28
  
https://github.com/phpmyadmin/phpmyadmin/commit/22e00e6a3578de1aede0ce06ef9e327c4bbe3f28
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2015-01-09 (Fri, 09 Jan 2015)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2996 of 2996 strings)

[CI skip]


--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 6f8431: Translated using Weblate (Turkish)

2015-01-08 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: 6f8431a71d935b9710d8f5148b3941f21408052d
  
https://github.com/phpmyadmin/phpmyadmin/commit/6f8431a71d935b9710d8f5148b3941f21408052d
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2015-01-08 (Thu, 08 Jan 2015)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2995 of 2995 strings)

[CI skip]


--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 2eddd0: Translated using Weblate (Turkish)

2015-01-01 Thread Burak Yavuz
  Branch: refs/heads/QA_4_3
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: 2eddd0dc06e3f5ce3899fd2436b6b5541fcbcbfc
  
https://github.com/phpmyadmin/phpmyadmin/commit/2eddd0dc06e3f5ce3899fd2436b6b5541fcbcbfc
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2015-01-01 (Thu, 01 Jan 2015)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2982 of 2982 strings)

[CI skip]


--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 530c04: Translated using Weblate (Turkish)

2015-01-01 Thread Burak Yavuz
  Branch: refs/heads/QA_4_3
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: 530c04d14a9de6ba9b287b2a98306a09d04ee055
  
https://github.com/phpmyadmin/phpmyadmin/commit/530c04d14a9de6ba9b287b2a98306a09d04ee055
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2015-01-01 (Thu, 01 Jan 2015)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2982 of 2982 strings)

[CI skip]


--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[Phpmyadmin-git] [phpmyadmin/phpmyadmin] f492a2: Translated using Weblate (Turkish)

2015-01-01 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: f492a2197d598a1618836719a47beaf16874ecfd
  
https://github.com/phpmyadmin/phpmyadmin/commit/f492a2197d598a1618836719a47beaf16874ecfd
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2015-01-01 (Thu, 01 Jan 2015)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2993 of 2993 strings)

[CI skip]


--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[Phpmyadmin-git] [phpmyadmin/phpmyadmin] d26bff: Translated using Weblate (Turkish)

2015-01-01 Thread Burak Yavuz
  Branch: refs/heads/QA_4_3
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: d26bffd0ae44354c4f47e6852368c48166e1ab1f
  
https://github.com/phpmyadmin/phpmyadmin/commit/d26bffd0ae44354c4f47e6852368c48166e1ab1f
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2015-01-01 (Thu, 01 Jan 2015)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2982 of 2982 strings)

[CI skip]


--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


Re: null Error in ALS model predict

2014-12-24 Thread Burak Yavuz
Hi,

The MatrixFactorizationModel consists of two RDD's. When you use the second 
method, Spark tries to serialize both RDD's for the .map() function, 
which is not possible, because RDD's are not serializable. Therefore you 
receive the NULLPointerException. You must use the first method.

Best,
Burak

- Original Message -
From: Franco Barrientos franco.barrien...@exalitica.com
To: user@spark.apache.org
Sent: Wednesday, December 24, 2014 7:44:24 AM
Subject: null Error in ALS model predict

Hi all!,

 

I have  a RDD[(int,int,double,double)] where the first two int values are id
and product, respectively. I trained an implicit ALS algorithm and want to
make predictions from this RDD. I make two things but I think both ways are
same.

 

1-  Convert this RDD to RDD[(int,int)] and use
model.predict(RDD(int,int)), this works to me!

2-  Make a map and apply  model.predict(int,int), for example:

val ratings = RDD[(int,int,double,double)].map{ case (id, rubro, rating,
resp)= 

model.predict(id,rubro)

}

Where ratings is a RDD[Double].

 

Now, the second way when I apply a ratings.first() I get the follow error:



 

Why this happend? I need to use this second way.

 

Thanks in advance,

 

Franco Barrientos
Data Scientist

Málaga #115, Of. 1003, Las Condes.
Santiago, Chile.
(+562)-29699649
(+569)-76347893

 mailto:franco.barrien...@exalitica.com franco.barrien...@exalitica.com 

 http://www.exalitica.com/ www.exalitica.com


  http://exalitica.com/web/img/frim.png 

 



[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 0e0eda: Translated using Weblate (Turkish)

2014-12-15 Thread Burak Yavuz
  Branch: refs/heads/QA_4_3
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: 0e0eda5ff1f54eb07b26e9c46db734ff1eee966c
  
https://github.com/phpmyadmin/phpmyadmin/commit/0e0eda5ff1f54eb07b26e9c46db734ff1eee966c
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-12-16 (Tue, 16 Dec 2014)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2978 of 2978 strings)

[CI skip]


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[Phpmyadmin-git] [phpmyadmin/localized_docs] 3e6f0e: Translated using Weblate (Turkish)

2014-12-09 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/localized_docs
  Commit: 3e6f0edfc6e9be3c8cd45c4cb82b8d39afe8c9e6
  
https://github.com/phpmyadmin/localized_docs/commit/3e6f0edfc6e9be3c8cd45c4cb82b8d39afe8c9e6
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-12-09 (Tue, 09 Dec 2014)

  Changed paths:
M po/tr.mo
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (1657 of 1657 strings)

[CI skip]


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


Re: How can I make Spark Streaming count the words in a file in a unit test?

2014-12-08 Thread Burak Yavuz
Hi,

https://github.com/databricks/spark-perf/tree/master/streaming-tests/src/main/scala/streaming/perf
contains some performance tests for streaming. There are examples of how to 
generate synthetic files during the test in that repo, maybe you
can find some code snippets that you can use there.

Best,
Burak

- Original Message -
From: Emre Sevinc emre.sev...@gmail.com
To: user@spark.apache.org
Sent: Monday, December 8, 2014 2:36:41 AM
Subject: How can I make Spark Streaming count the words in a file in a unit 
test?

Hello,

I've successfully built a very simple Spark Streaming application in Java
that is based on the HdfsCount example in Scala at
https://github.com/apache/spark/blob/branch-1.1/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala
.

When I submit this application to my local Spark, it waits for a file to be
written to a given directory, and when I create that file it successfully
prints the number of words. I terminate the application by pressing Ctrl+C.

Now I've tried to create a very basic unit test for this functionality, but
in the test I was not able to print the same information, that is the
number of words.

What am I missing?

Below is the unit test file, and after that I've also included the code
snippet that shows the countWords method:

=
StarterAppTest.java
=
import com.google.common.io.Files;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;


import org.junit.*;

import java.io.*;

public class StarterAppTest {

  JavaStreamingContext ssc;
  File tempDir;

  @Before
  public void setUp() {
ssc = new JavaStreamingContext(local, test, new Duration(3000));
tempDir = Files.createTempDir();
tempDir.deleteOnExit();
  }

  @After
  public void tearDown() {
ssc.stop();
ssc = null;
  }

  @Test
  public void testInitialization() {
Assert.assertNotNull(ssc.sc());
  }


  @Test
  public void testCountWords() {

StarterApp starterApp = new StarterApp();

try {
  JavaDStreamString lines =
ssc.textFileStream(tempDir.getAbsolutePath());
  JavaPairDStreamString, Integer wordCounts =
starterApp.countWords(lines);

  System.err.println(= Word Counts ===);
  wordCounts.print();
  System.err.println(= Word Counts ===);

  ssc.start();

  File tmpFile = new File(tempDir.getAbsolutePath(), tmp.txt);
  PrintWriter writer = new PrintWriter(tmpFile, UTF-8);
  writer.println(8-Dec-2014: Emre Emre Emre Ergin Ergin Ergin);
  writer.close();

  System.err.println(= Word Counts ===);
  wordCounts.print();
  System.err.println(= Word Counts ===);

} catch (FileNotFoundException e) {
  e.printStackTrace();
} catch (UnsupportedEncodingException e) {
  e.printStackTrace();
}


Assert.assertTrue(true);

  }

}
=

This test compiles and starts to run, Spark Streaming prints a lot of
diagnostic messages on the console but the calls to wordCounts.print();
does not print anything, whereas in StarterApp.java itself, they do.

I've also added ssc.awaitTermination(); after ssc.start() but nothing
changed in that respect. After that I've also tried to create a new file in
the directory that this Spark Streaming application was checking but this
time it gave an error.

For completeness, below is the wordCounts method:


public JavaPairDStreamString, Integer countWords(JavaDStreamString
lines) {
JavaDStreamString words = lines.flatMap(new FlatMapFunctionString,
String() {
  @Override
  public IterableString call(String x) { return
Lists.newArrayList(SPACE.split(x)); }
});

JavaPairDStreamString, Integer wordCounts = words.mapToPair(
new PairFunctionString, String, Integer() {
  @Override
  public Tuple2String, Integer call(String s) { return new
Tuple2(s, 1); }
}).reduceByKey((i1, i2) - i1 + i2);

return wordCounts;
  }




Kind regards
Emre Sevinç


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[Phpmyadmin-git] [phpmyadmin/phpmyadmin] eed0ff: Translated using Weblate (Turkish)

2014-11-26 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: eed0ffa96b6ee739036175912c32fca25985bead
  
https://github.com/phpmyadmin/phpmyadmin/commit/eed0ffa96b6ee739036175912c32fca25985bead
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-11-26 (Wed, 26 Nov 2014)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2991 of 2991 strings)

[CI skip]


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[jira] [Created] (SPARK-4409) Additional (but limited) Linear Algebra Utils

2014-11-14 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-4409:
--

 Summary: Additional (but limited) Linear Algebra Utils
 Key: SPARK-4409
 URL: https://issues.apache.org/jira/browse/SPARK-4409
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Burak Yavuz
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4409) Additional (but limited) Linear Algebra Utils

2014-11-14 Thread Burak Yavuz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz updated SPARK-4409:
---
Description: 
This ticket is to discuss the addition of a very limited number of local matrix 
manipulation and generation methods that would be helpful in the further 
development for algorithms on top of BlockMatrix (SPARK-3974), such as 
Randomized SVD, and Multi Model Training (SPARK-1486).
The proposed methods for addition are:
For `Matrix`
 -  map: maps the values in the matrix with a given function. Produces a new 
matrix.
 -  update: the values in the matrix are updated with a given function. Occurs 
in place.
Factory methods for `DenseMatrix`:
 -  *zeros: Generate a matrix consisting of zeros
 -  *ones: Generate a matrix consisting of ones
 -  *eye: Generate an identity matrix
 -  *rand: Generate a matrix consisting of i.i.d. uniform random numbers
 -  *randn: Generate a matrix consisting of i.i.d. gaussian random numbers
 -  *diag: Generate a diagonal matrix from a supplied vector
*These methods already exist in the factory methods for `Matrices`, however for 
cases where we require a `DenseMatrix`, you constantly have to add 
`.asInstanceOf[DenseMatrix]` everywhere, which makes the code dirtier. I 
propose moving these functions to factory methods for `DenseMatrix` where the 
putput will be a `DenseMatrix` and the factory methods for `Matrices` will call 
these functions directly and output a generic `Matrix`.

Factory methods for `SparseMatrix`:
 -  speye: Identity matrix in sparse format. Saves a ton of memory when 
dimensions are large, especially in Multi Model Training, where each row 
requires being multiplied by a scalar.
 -  sprand: Generate a sparse matrix with a given density consisting of i.i.d. 
uniform random numbers.
 -  sprandn: Generate a sparse matrix with a given density consisting of i.i.d. 
gaussian random numbers.
 -  diag: Generate a diagonal matrix from a supplied vector, but is memory 
efficient, because it just stores the diagonal. Again, very helpful in Multi 
Model Training.

Factory methods for `Matrices`:
 -  Include all the factory methods given above, but return a generic `Matrix` 
rather than `SparseMatrix` or `DenseMatrix`.
 -  horzCat: Horizontally concatenate matrices to form one larger matrix. Very 
useful in both Multi Model Training, and for the repartitioning of BlockMatrix.
 -  vertCat: Vertically concatenate matrices to form one larger matrix. Very 
useful for the repartitioning of BlockMatrix.

 Additional (but limited) Linear Algebra Utils
 -

 Key: SPARK-4409
 URL: https://issues.apache.org/jira/browse/SPARK-4409
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Burak Yavuz
Priority: Minor

 This ticket is to discuss the addition of a very limited number of local 
 matrix manipulation and generation methods that would be helpful in the 
 further development for algorithms on top of BlockMatrix (SPARK-3974), such 
 as Randomized SVD, and Multi Model Training (SPARK-1486).
 The proposed methods for addition are:
 For `Matrix`
  -  map: maps the values in the matrix with a given function. Produces a new 
 matrix.
  -  update: the values in the matrix are updated with a given function. 
 Occurs in place.
 Factory methods for `DenseMatrix`:
  -  *zeros: Generate a matrix consisting of zeros
  -  *ones: Generate a matrix consisting of ones
  -  *eye: Generate an identity matrix
  -  *rand: Generate a matrix consisting of i.i.d. uniform random numbers
  -  *randn: Generate a matrix consisting of i.i.d. gaussian random numbers
  -  *diag: Generate a diagonal matrix from a supplied vector
 *These methods already exist in the factory methods for `Matrices`, however 
 for cases where we require a `DenseMatrix`, you constantly have to add 
 `.asInstanceOf[DenseMatrix]` everywhere, which makes the code dirtier. I 
 propose moving these functions to factory methods for `DenseMatrix` where the 
 putput will be a `DenseMatrix` and the factory methods for `Matrices` will 
 call these functions directly and output a generic `Matrix`.
 Factory methods for `SparseMatrix`:
  -  speye: Identity matrix in sparse format. Saves a ton of memory when 
 dimensions are large, especially in Multi Model Training, where each row 
 requires being multiplied by a scalar.
  -  sprand: Generate a sparse matrix with a given density consisting of 
 i.i.d. uniform random numbers.
  -  sprandn: Generate a sparse matrix with a given density consisting of 
 i.i.d. gaussian random numbers.
  -  diag: Generate a diagonal matrix from a supplied vector, but is memory 
 efficient, because it just stores the diagonal. Again, very helpful in Multi 
 Model Training.
 Factory methods for `Matrices`:
  -  Include all the factory methods given above, but return

[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 346b62: Translated using Weblate (Turkish)

2014-11-04 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: 346b62740ab25f5d325f4aa74aeadd8aad7236c4
  
https://github.com/phpmyadmin/phpmyadmin/commit/346b62740ab25f5d325f4aa74aeadd8aad7236c4
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-11-04 (Tue, 04 Nov 2014)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2976 of 2976 strings)

[CI skip]


--
___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[jira] [Commented] (SPARK-3974) Block matrix abstracitons and partitioners

2014-10-31 Thread Burak Yavuz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192731#comment-14192731
 ] 

Burak Yavuz commented on SPARK-3974:


Hi everyone,
The design doc for Block Matrix abstractions and the work on matrix 
multiplication can be found here:
goo.gl/zbU1Nz

Let me know if you have any comments / suggestions. I will have the PR for this 
ready by next Friday hopefully.

 Block matrix abstracitons and partitioners
 --

 Key: SPARK-3974
 URL: https://issues.apache.org/jira/browse/SPARK-3974
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Reza Zadeh
Assignee: Burak Yavuz

 We need abstractions for block matrices with fixed block sizes, with each 
 block being dense. Partitioners along both rows and columns required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: MLLib ALS ArrayIndexOutOfBoundsException with Scala Spark 1.1.0

2014-10-27 Thread Burak Yavuz
Hi,

I've come across this multiple times, but not in a consistent manner. I found 
it hard to reproduce. I have a jira for it: SPARK-3080

Do you observe this error every single time? Where do you load your data from? 
Which version of Spark are you running? 
Figuring out the similarities may help in pinpointing the bug.

Thanks,
Burak

- Original Message -
From: Ilya Ganelin ilgan...@gmail.com
To: user user@spark.apache.org
Sent: Monday, October 27, 2014 11:36:46 AM
Subject: MLLib ALS ArrayIndexOutOfBoundsException with Scala Spark 1.1.0

Hello all - I am attempting to run MLLib's ALS algorithm on a substantial
test vector - approx. 200 million records.

I have resolved a few issues I've had with regards to garbage collection,
KryoSeralization, and memory usage.

I have not been able to get around this issue I see below however:


 java.lang.
 ArrayIndexOutOfBoundsException: 6106

 org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateBlock$1.apply$mcVI$sp(ALS.
 scala:543)
 scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
 org.apache.spark.mllib.recommendation.ALS.org
 $apache$spark$mllib$recommendation$ALS$$updateBlock(ALS.scala:537)

 org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:505)

 org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:504)

 org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)

 org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)
 scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

 org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:144)

 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)

 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158)

 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)

 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

 org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)


I do not have any negative indices or indices that exceed Int-Max.

I have partitioned the input data into 300 partitions and my Spark config
is below:

.set(spark.executor.memory, 14g)
  .set(spark.storage.memoryFraction, 0.8)
  .set(spark.serializer, org.apache.spark.serializer.KryoSerializer)
  .set(spark.kryo.registrator, MyRegistrator)
  .set(spark.core.connection.ack.wait.timeout,600)
  .set(spark.akka.frameSize,50)
  .set(spark.yarn.executor.memoryOverhead,1024)

Does anyone have any suggestions as to why i'm seeing the above error or
how to get around it?
It may be possible to upgrade to the latest version of Spark but the
mechanism for doing so in our environment isn't obvious yet.

-Ilya Ganelin


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 7180bb: Translated using Weblate (Turkish)

2014-10-16 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: 7180bb0f150e81dc6ceb0ff1e582bd85fdb69306
  
https://github.com/phpmyadmin/phpmyadmin/commit/7180bb0f150e81dc6ceb0ff1e582bd85fdb69306
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-10-16 (Thu, 16 Oct 2014)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2973 of 2973 strings)

[CI skip]


--
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


Re: Spark KMeans hangs at reduceByKey / collectAsMap

2014-10-14 Thread Burak Yavuz
Hi Ray,

The reduceByKey / collectAsMap does a lot of calculations. Therefore it can 
take a very long time if:
1) The parameter number of runs is set very high
2) k is set high (you have observed this already)
3) data is not properly repartitioned
It seems that it is hanging, but there is a lot of calculation going on.

Did you use a different value for the number of runs?
If you look at the storage tab, does the data look balanced among executors?

Best,
Burak

- Original Message -
From: Ray ray-w...@outlook.com
To: u...@spark.incubator.apache.org
Sent: Tuesday, October 14, 2014 2:58:03 PM
Subject: Re: Spark KMeans hangs at reduceByKey / collectAsMap

Hi Xiangrui,

The input dataset has 1.5 million sparse vectors. Each sparse vector has a
dimension(cardinality) of 9153 and has less than 15 nonzero elements.


Yes, if I set num-executors = 200, from the hadoop cluster scheduler, I can
see the application got  201 vCores. From the spark UI, I can see it got 201
executors (as shown below).

http://apache-spark-user-list.1001560.n3.nabble.com/file/n16428/spark_core.png
  

http://apache-spark-user-list.1001560.n3.nabble.com/file/n16428/spark_executor.png
 



Thanks.

Ray




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-KMeans-hangs-at-reduceByKey-collectAsMap-tp16413p16428.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[Phpmyadmin-git] [phpmyadmin/localized_docs] ac28de: Translated using Weblate (Turkish)

2014-10-12 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/localized_docs
  Commit: ac28dedf064d6f2064afb17d3311a929edd95dad
  
https://github.com/phpmyadmin/localized_docs/commit/ac28dedf064d6f2064afb17d3311a929edd95dad
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-10-12 (Sun, 12 Oct 2014)

  Changed paths:
M po/tr.mo
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (1656 of 1656 strings)

[CI skip]


--
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://p.sf.net/sfu/Zoho___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[jira] [Commented] (SPARK-3434) Distributed block matrix

2014-10-10 Thread Burak Yavuz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167152#comment-14167152
 ] 

Burak Yavuz commented on SPARK-3434:


[~ConcreteVitamin], any updates? Anything I can help out with?

 Distributed block matrix
 

 Key: SPARK-3434
 URL: https://issues.apache.org/jira/browse/SPARK-3434
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Xiangrui Meng

 This JIRA is for discussing distributed matrices stored in block 
 sub-matrices. The main challenge is the partitioning scheme to allow adding 
 linear algebra operations in the future, e.g.:
 1. matrix multiplication
 2. matrix factorization (QR, LU, ...)
 Let's discuss the partitioning and storage and how they fit into the above 
 use cases.
 Questions:
 1. Should it be backed by a single RDD that contains all of the sub-matrices 
 or many RDDs with each contains only one sub-matrix?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 2079e9: Translated using Weblate (Turkish)

2014-10-06 Thread Burak Yavuz
  Branch: refs/heads/QA_4_2
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: 2079e9cd9abf4d76e50494ce4bf8f7c1d4999164
  
https://github.com/phpmyadmin/phpmyadmin/commit/2079e9cd9abf4d76e50494ce4bf8f7c1d4999164
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-10-06 (Mon, 06 Oct 2014)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2768 of 2768 strings)

[CI skip]


--
Slashdot TV.  Videos for Nerds.  Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471iu=/4140/ostg.clktrk___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[Phpmyadmin-git] [phpmyadmin/localized_docs] 38df14: Translated using Weblate (Turkish)

2014-10-02 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/localized_docs
  Commit: 38df143ca748c7a5236c70cb0c715ea948195184
  
https://github.com/phpmyadmin/localized_docs/commit/38df143ca748c7a5236c70cb0c715ea948195184
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-10-02 (Thu, 02 Oct 2014)

  Changed paths:
M po/tr.mo
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (1648 of 1648)

[ci skip]


--
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311iu=/4140/ostg.clktrk___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 5288df: Translated using Weblate (Turkish)

2014-10-02 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: 5288df43097df61237fe4d9320a56b0886ed11db
  
https://github.com/phpmyadmin/phpmyadmin/commit/5288df43097df61237fe4d9320a56b0886ed11db
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-10-02 (Thu, 02 Oct 2014)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2965 of 2965 strings)

[CI skip]


--
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311iu=/4140/ostg.clktrk___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[Phpmyadmin-git] [phpmyadmin/localized_docs] 1169c4: Translated using Weblate (Turkish)

2014-10-02 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/localized_docs
  Commit: 1169c49661f124d4d617d1316d62404d598d30bf
  
https://github.com/phpmyadmin/localized_docs/commit/1169c49661f124d4d617d1316d62404d598d30bf
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-10-02 (Thu, 02 Oct 2014)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (1657 of 1657 strings)

[CI skip]


--
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311iu=/4140/ostg.clktrk___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


Re: MLlib Linear Regression Mismatch

2014-10-01 Thread Burak Yavuz
Hi,

It appears that the step size is too high that the model is diverging with the 
added noise. 
Could you try by setting the step size to be 0.1 or 0.01?

Best,
Burak

- Original Message -
From: Krishna Sankar ksanka...@gmail.com
To: user@spark.apache.org
Sent: Wednesday, October 1, 2014 12:43:20 PM
Subject: MLlib Linear Regression Mismatch

Guys,
   Obviously I am doing something wrong. May be 4 points are too small a
dataset.
Can you help me to figure out why the following doesn't work ?
a) This works :

data = [
   LabeledPoint(0.0, [0.0]),
   LabeledPoint(10.0, [10.0]),
   LabeledPoint(20.0, [20.0]),
   LabeledPoint(30.0, [30.0])
]
lrm = LinearRegressionWithSGD.train(sc.parallelize(data),
initialWeights=array([1.0]))
print lrm
print lrm.weights
print lrm.intercept
lrm.predict([40])

output:
pyspark.mllib.regression.LinearRegressionModel object at 0x109813d50

[ 1.]
0.0

40.0

b) By perturbing the y a little bit, the model gives wrong results:

data = [
   LabeledPoint(0.0, [0.0]),
   LabeledPoint(9.0, [10.0]),
   LabeledPoint(22.0, [20.0]),
   LabeledPoint(32.0, [30.0])
]
lrm = LinearRegressionWithSGD.train(sc.parallelize(data),
initialWeights=array([1.0])) # should be 1.09x -0.60
print lrm
print lrm.weights
print lrm.intercept
lrm.predict([40])

Output:
pyspark.mllib.regression.LinearRegressionModel object at 0x109666590

[ -8.20487463e+203]
0.0

-3.2819498532740317e+205

c) Same story here - wrong results. Actually nan:

data = [
   LabeledPoint(18.9, [3910.0]),
   LabeledPoint(17.0, [3860.0]),
   LabeledPoint(20.0, [4200.0]),
   LabeledPoint(16.6, [3660.0])
]
lrm = LinearRegressionWithSGD.train(sc.parallelize(data),
initialWeights=array([1.0])) # should be ~ 0.006582x -7.595170
print lrm
print lrm.weights
print lrm.intercept
lrm.predict([4000])

Output:pyspark.mllib.regression.LinearRegressionModel object at
0x109666b90

[ nan]
0.0

nan

Cheers  Thanks
k/


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[Phpmyadmin-git] [phpmyadmin/localized_docs] 1c004d: Translated using Weblate (Turkish)

2014-09-22 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/localized_docs
  Commit: 1c004d7e341e8e0d4b5c17dcdc64181220725193
  
https://github.com/phpmyadmin/localized_docs/commit/1c004d7e341e8e0d4b5c17dcdc64181220725193
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-09-22 (Mon, 22 Sep 2014)

  Changed paths:
M po/tr.mo
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (1648 of 1648)

[ci skip]


--
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311iu=/4140/ostg.clktrk___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[jira] [Commented] (SPARK-3631) Add docs for checkpoint usage

2014-09-22 Thread Burak Yavuz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143484#comment-14143484
 ] 

Burak Yavuz commented on SPARK-3631:


Thanks for setting this up [~aash]! [~pwendell], [~tdas], [~joshrosen] could 
you please confirm/correct/add to my explanation above. Thanks!

 Add docs for checkpoint usage
 -

 Key: SPARK-3631
 URL: https://issues.apache.org/jira/browse/SPARK-3631
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 1.1.0
Reporter: Andrew Ash
Assignee: Andrew Ash

 We should include general documentation on using checkpoints.  Right now the 
 docs only cover checkpoints in the Spark Streaming use case which is slightly 
 different from Core.
 Some content to consider for inclusion from [~brkyvz]:
 {quote}
 If you set the checkpointing directory however, the intermediate state of the 
 RDDs will be saved in HDFS, and the lineage will pick off from there.
 You won't need to keep the shuffle data before the checkpointed state, 
 therefore those can be safely removed (will be removed automatically).
 However, checkpoint must be called explicitly as in 
 https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L291
  ,just setting the directory will not be enough.
 {quote}
 {quote}
 Yes, writing to HDFS is more expensive, but I feel it is still a small price 
 to pay when compared to having a Disk Space Full error three hours in
 and having to start from scratch.
 The main goal of checkpointing is to truncate the lineage. Clearing up 
 shuffle writes come as a bonus to checkpointing, it is not the main goal. The
 subtlety here is that .checkpoint() is just like .cache(). Until you call an 
 action, nothing happens. Therefore, if you're going to do 1000 maps in a
 row and you don't want to checkpoint in the meantime until a shuffle happens, 
 you will still get a StackOverflowError, because the lineage is too long.
 I went through some of the code for checkpointing. As far as I can tell, it 
 materializes the data in HDFS, and resets all its dependencies, so you start
 a fresh lineage. My understanding would be that checkpointing still should be 
 done every N operations to reset the lineage. However, an action must be
 performed before the lineage grows too long.
 {quote}
 A good place to put this information would be at 
 https://spark.apache.org/docs/latest/programming-guide.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Python version of kmeans

2014-09-18 Thread Burak Yavuz
Hi,

spark-1.0.1/examples/src/main/python/kmeans.py = Naive example for users to 
understand how to code in Spark
spark-1.0.1/python/pyspark/mllib/clustering.py = Use this!!!

Bonus: spark-1.0.1/examples/src/main/python/mllib/kmeans.py = Example on how 
to call KMeans. Feel free to use it as a template!

Best,
Burak

- Original Message -
From: MEETHU MATHEW meethu2...@yahoo.co.in
To: user@spark.apache.org
Sent: Wednesday, September 17, 2014 10:26:40 PM
Subject: Python version of kmeans

Hi all,

I need the kmeans code written against Pyspark for some testing purpose.
Can somebody tell me the difference between these two files.

 spark-1.0.1/examples/src/main/python/kmeans.py   and 

 spark-1.0.1/python/pyspark/mllib/clustering.py


Thanks  Regards, 
Meethu M


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Odd error when using a rdd map within a stream map

2014-09-18 Thread Burak Yavuz
Hi,

I believe it's because you're trying to use a Function of an RDD, in an RDD, 
which is not possible. Instead of using a
`FunctionJavaRDDFloat`, could you try FunctionFloat, and
`public Void call(Float arg0) throws Exception { `
and
`System.out.println(arg0)`

instead. I'm not perfectly sure of the semantics in Java, but this should be 
what you're actually trying to do.

Best,
Burak

- Original Message -
From: Filip Andrei andreis.fi...@gmail.com
To: u...@spark.incubator.apache.org
Sent: Thursday, September 18, 2014 6:57:21 AM
Subject: Odd error when using a rdd map within a stream map

here i wrote a simpler version of the code to get an understanding of how it
works:

final ListNeuralNet nns = new ArrayListNeuralNet(); 
for(int i = 0; i  numberOfNets; i++){ 
nns.add(NeuralNet.createFrom(...)); 
} 

final JavaRDDNeuralNet nnRdd = sc.parallelize(nns);   
JavaDStreamFloat results = rndLists.flatMap(new
FlatMapFunctionMaplt;String,Object, Float() { 
@Override 
public IterableFloat call(MapString, Object input) 
throws Exception { 

Float f = nnRdd.map(new FunctionNeuralNet, Float() { 

@Override 
public Float call(NeuralNet nn) throws Exception { 

return 1.0f; 
} 
}).reduce(new Function2Float, Float, Float() { 

@Override 
public Float call(Float left, Float right) throws Exception { 

return left + right; 
} 
}); 

return Arrays.asList(f); 
} 
}); 
results.print();


This works as expected and print() simply shows the number of neural nets i
have
If instead a print() i use

results.foreach(new FunctionJavaRDDlt;Float, Void() { 

@Override 
public Void call(JavaRDDFloat arg0) throws Exception { 


for(Float f : arg0.collect()){ 
System.out.println(f); 
} 
return null; 
} 
});

It fails with the following exception
org.apache.spark.SparkException: Job aborted due to stage failure: Task
1.0:0 failed 1 times, most recent failure: Exception failure in TID 1 on
host localhost: java.lang.NullPointerException 
org.apache.spark.rdd.RDD.map(RDD.scala:270)

This is weird to me since the same code executes as expected in one case and
doesn't in the other, any idea what's going on here ?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Odd-error-when-using-a-rdd-map-within-a-stream-map-tp14551.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark on EC2

2014-09-18 Thread Burak Yavuz
Hi Gilberto,

Could you please attach the driver logs as well, so that we can pinpoint what's 
going wrong? Could you also add the flag
`--driver-memory 4g` while submitting your application and try that as well?

Best,
Burak

- Original Message -
From: Gilberto Lira g...@scanboo.com.br
To: user@spark.apache.org
Sent: Thursday, September 18, 2014 11:48:03 AM
Subject: Spark on EC2

Hello, I am trying to run a python script that makes use of the kmeans MLIB and 
I'm not getting anywhere. I'm using an c3.xlarge instance as master, and 10 
c3.large instances as slaves. In the code I make a map of a 600MB csv file in 
S3, where each row has 128 integer columns. The problem is that around the TID7 
my slave stops responding, and I can not finish my processing. Could you help 
me with this problem? I sending my script attached for review. 

Thank you, 
Gilberto 



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[Phpmyadmin-git] [phpmyadmin/localized_docs] a01814: Translated using Weblate (Turkish)

2014-09-17 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/localized_docs
  Commit: a01814147d950fa4fa4a4a9006a7c5690a9701b6
  
https://github.com/phpmyadmin/localized_docs/commit/a01814147d950fa4fa4a4a9006a7c5690a9701b6
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-09-17 (Wed, 17 Sep 2014)

  Changed paths:
M po/tr.mo
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (1653 of 1653)

[ci skip]


--
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191iu=/4140/ostg.clktrk___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


Re: Spark and disk usage.

2014-09-17 Thread Burak Yavuz
Hi,

The files you mentioned are temporary files written by Spark during shuffling. 
ALS will write a LOT of those files as it is a shuffle heavy algorithm.
Those files will be deleted after your program completes as Spark looks for 
those files in case a fault occurs. Having those files ready allows Spark to 
continue from the stage the shuffle left off, instead of starting from the very 
beginning.

Long story short, it's to your benefit that Spark writes those files to disk. 
If you don't want Spark writing to disk, you can specify a checkpoint directory 
in
HDFS, where Spark will write the current status instead and will clean up files 
from disk.

Best,
Burak

- Original Message -
From: Макар Красноперов connector@gmail.com
To: user@spark.apache.org
Sent: Wednesday, September 17, 2014 7:37:49 AM
Subject: Spark and disk usage.

Hello everyone.

The problem is that spark write data to the disk very hard, even if
application has a lot of free memory (about 3.8g).
So, I've noticed that folder with name like
spark-local-20140917165839-f58c contains a lot of other folders with
files like shuffle_446_0_1. The total size of files in the dir
spark-local-20140917165839-f58c can reach 1.1g.
Sometimes its size decreases (are there only temp files in that folder?),
so the totally amount of data written to the disk is greater than 1.1g.

The question is what kind of data Spark store there and can I make spark
not to write it on the disk and just keep it in the memory if there is
enough RAM free space?

I run my job locally with Spark 1.0.1:
./bin/spark-submit --driver-memory 12g --master local[3] --properties-file
conf/spark-defaults.conf --class my.company.Main /path/to/jar/myJob.jar

spark-defaults.conf :
spark.shuffle.spill false
spark.reducer.maxMbInFlight 1024
spark.shuffle.file.buffer.kb2048
spark.storage.memoryFraction0.7

The situation with disk usage is common for many jobs. I had also used ALS
from MLIB and saw the similar things.

I had reached no success by playing with spark configuration and i hope
someone can help me :)


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark and disk usage.

2014-09-17 Thread Burak Yavuz
Hi Andrew,

Yes, I'm referring to sparkContext.setCheckpointDir(). It has the same effect 
as in Spark Streaming. 
For example, in an algorithm like ALS, the RDDs go through many transformations 
and the lineage of the RDD starts to grow drastically just like 
the lineage of DStreams do in Spark Streaming. You may observe 
StackOverflowErrors in ALS if you set the number of iterations to be very high. 

If you set the checkpointing directory however, the intermediate state of the 
RDDs will be saved in HDFS, and the lineage will pick off from there. 
You won't need to keep the shuffle data before the checkpointed state, 
therefore those can be safely removed (will be removed automatically).
However, checkpoint must be called explicitly as in 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L291
 ,just setting the directory will not be enough.

Best,
Burak

- Original Message -
From: Andrew Ash and...@andrewash.com
To: Burak Yavuz bya...@stanford.edu
Cc: Макар Красноперов connector@gmail.com, user 
user@spark.apache.org
Sent: Wednesday, September 17, 2014 10:19:42 AM
Subject: Re: Spark and disk usage.

Hi Burak,

Most discussions of checkpointing in the docs is related to Spark
streaming.  Are you talking about the sparkContext.setCheckpointDir()?
 What effect does that have?

https://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing

On Wed, Sep 17, 2014 at 7:44 AM, Burak Yavuz bya...@stanford.edu wrote:

 Hi,

 The files you mentioned are temporary files written by Spark during
 shuffling. ALS will write a LOT of those files as it is a shuffle heavy
 algorithm.
 Those files will be deleted after your program completes as Spark looks
 for those files in case a fault occurs. Having those files ready allows
 Spark to
 continue from the stage the shuffle left off, instead of starting from the
 very beginning.

 Long story short, it's to your benefit that Spark writes those files to
 disk. If you don't want Spark writing to disk, you can specify a checkpoint
 directory in
 HDFS, where Spark will write the current status instead and will clean up
 files from disk.

 Best,
 Burak

 - Original Message -
 From: Макар Красноперов connector@gmail.com
 To: user@spark.apache.org
 Sent: Wednesday, September 17, 2014 7:37:49 AM
 Subject: Spark and disk usage.

 Hello everyone.

 The problem is that spark write data to the disk very hard, even if
 application has a lot of free memory (about 3.8g).
 So, I've noticed that folder with name like
 spark-local-20140917165839-f58c contains a lot of other folders with
 files like shuffle_446_0_1. The total size of files in the dir
 spark-local-20140917165839-f58c can reach 1.1g.
 Sometimes its size decreases (are there only temp files in that folder?),
 so the totally amount of data written to the disk is greater than 1.1g.

 The question is what kind of data Spark store there and can I make spark
 not to write it on the disk and just keep it in the memory if there is
 enough RAM free space?

 I run my job locally with Spark 1.0.1:
 ./bin/spark-submit --driver-memory 12g --master local[3] --properties-file
 conf/spark-defaults.conf --class my.company.Main /path/to/jar/myJob.jar

 spark-defaults.conf :
 spark.shuffle.spill false
 spark.reducer.maxMbInFlight 1024
 spark.shuffle.file.buffer.kb2048
 spark.storage.memoryFraction0.7

 The situation with disk usage is common for many jobs. I had also used ALS
 from MLIB and saw the similar things.

 I had reached no success by playing with spark configuration and i hope
 someone can help me :)


 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark and disk usage.

2014-09-17 Thread Burak Yavuz
Yes, writing to HDFS is more expensive, but I feel it is still a small price to 
pay when compared to having a Disk Space Full error three hours in
and having to start from scratch.

The main goal of checkpointing is to truncate the lineage. Clearing up shuffle 
writes come as a bonus to checkpointing, it is not the main goal. The 
subtlety here is that .checkpoint() is just like .cache(). Until you call an 
action, nothing happens. Therefore, if you're going to do 1000 maps in a 
row and you don't want to checkpoint in the meantime until a shuffle happens, 
you will still get a StackOverflowError, because the lineage is too long.

I went through some of the code for checkpointing. As far as I can tell, it 
materializes the data in HDFS, and resets all its dependencies, so you start 
a fresh lineage. My understanding would be that checkpointing still should be 
done every N operations to reset the lineage. However, an action must be 
performed before the lineage grows too long.

I believe it would be nice to write up checkpointing in the programming guide. 
The reason that it's not there yet I believe is that most applications don't
grow such a long lineage, except in Spark Streaming, and some MLlib algorithms. 
If you can help with the guide, I think it would be a nice feature to have!

Burak


- Original Message -
From: Andrew Ash and...@andrewash.com
To: Burak Yavuz bya...@stanford.edu
Cc: Макар Красноперов connector@gmail.com, user 
user@spark.apache.org
Sent: Wednesday, September 17, 2014 11:04:02 AM
Subject: Re: Spark and disk usage.

Thanks for the info!

Are there performance impacts with writing to HDFS instead of local disk?
 I'm assuming that's why ALS checkpoints every third iteration instead of
every iteration.

Also I can imagine that checkpointing should be done every N shuffles
instead of every N operations (counting maps), since only the shuffle
leaves data on disk.  Do you have any suggestions on this?

We should write up some guidance on the use of checkpointing in the programming
guide https://spark.apache.org/docs/latest/programming-guide.html - I can
help with this

Andrew


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Size exceeds Integer.MAX_VALUE in BlockFetcherIterator

2014-09-17 Thread Burak Yavuz
Hi,

Could you try repartitioning the data by .repartition(# of cores on machine) or 
while reading the data, supply the number of minimum partitions as in
sc.textFile(path, # of cores on machine).

It may be that the whole data is stored in one block? If it is billions of 
rows, then the indexing probably will not work giving the exceeds 
Integer.MAX_VALUE error.

Best,
Burak

- Original Message -
From: francisco ftanudj...@nextag.com
To: u...@spark.incubator.apache.org
Sent: Wednesday, September 17, 2014 3:18:29 PM
Subject: Size exceeds Integer.MAX_VALUE in BlockFetcherIterator

Hi,

We are running aggregation on a huge data set (few billion rows).
While running the task got the following error (see below). Any ideas?
Running spark 1.1.0 on cdh distribution.

...
14/09/17 13:33:30 INFO Executor: Finished task 0.0 in stage 1.0 (TID 0).
2083 bytes result sent to driver
14/09/17 13:33:30 INFO CoarseGrainedExecutorBackend: Got assigned task 1
14/09/17 13:33:30 INFO Executor: Running task 0.0 in stage 2.0 (TID 1)
14/09/17 13:33:30 INFO TorrentBroadcast: Started reading broadcast variable
2
14/09/17 13:33:30 INFO MemoryStore: ensureFreeSpace(1428) called with
curMem=163719, maxMem=34451478282
14/09/17 13:33:30 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes
in memory (estimated size 1428.0 B, free 32.1 GB)
14/09/17 13:33:30 INFO BlockManagerMaster: Updated info of block
broadcast_2_piece0
14/09/17 13:33:30 INFO TorrentBroadcast: Reading broadcast variable 2 took
0.027374294 s
14/09/17 13:33:30 INFO MemoryStore: ensureFreeSpace(2336) called with
curMem=165147, maxMem=34451478282
14/09/17 13:33:30 INFO MemoryStore: Block broadcast_2 stored as values in
memory (estimated size 2.3 KB, free 32.1 GB)
14/09/17 13:33:30 INFO MapOutputTrackerWorker: Updating epoch to 1 and
clearing cache
14/09/17 13:33:30 INFO MapOutputTrackerWorker: Don't have map outputs for
shuffle 1, fetching them
14/09/17 13:33:30 INFO MapOutputTrackerWorker: Doing the fetch; tracker
actor =
Actor[akka.tcp://sparkdri...@sas-model1.pv.sv.nextag.com:56631/user/MapOutputTracker#794212052]
14/09/17 13:33:30 INFO MapOutputTrackerWorker: Got the output locations
14/09/17 13:33:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
maxBytesInFlight: 50331648, targetRequestSize: 10066329
14/09/17 13:33:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
Getting 1 non-empty blocks out of 1 blocks
14/09/17 13:33:30 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
Started 0 remote fetches in 8 ms
14/09/17 13:33:30 ERROR BlockFetcherIterator$BasicBlockFetcherIterator:
Error occurred while fetching local blocks
java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:828)
at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:104)
at org.apache.spark.storage.DiskStore.getValues(DiskStore.scala:120)
at
org.apache.spark.storage.BlockManager.getLocalFromDisk(BlockManager.scala:358)
at
org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:208)
at
org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:205)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator.getLocalBlocks(BlockFetcherIterator.scala:205)
at
org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator.initialize(BlockFetcherIterator.scala:240)
at
org.apache.spark.storage.BlockManager.getMultiple(BlockManager.scala:583)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.fetch(BlockStoreShuffleFetcher.scala:77)
at
org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:41)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/09/17 13:33:30 INFO 

Re: MLLib: LIBSVM issue

2014-09-17 Thread Burak Yavuz
Hi,

The spacing between the inputs should be a single space, not a tab. I feel like 
your inputs have tabs between them instead of a single space. Therefore the 
parser
cannot parse the input.

Best,
Burak

- Original Message -
From: Sameer Tilak ssti...@live.com
To: user@spark.apache.org
Sent: Wednesday, September 17, 2014 7:25:10 PM
Subject: MLLib: LIBSVM issue

Hi All,We have a fairly large amount of sparse data. I was following the 
following instructions in the manual:
Sparse dataIt is very common in practice to have sparse training data. MLlib 
supports reading training examples stored in LIBSVM format, which is the 
default format used by LIBSVM and LIBLINEAR. It is a text format in which each 
line represents a labeled sparse feature vector using the following 
format:label index1:value1 index2:value2 ...
import org.apache.spark.mllib.regression.LabeledPointimport 
org.apache.spark.mllib.util.MLUtilsimport org.apache.spark.rdd.RDD
val examples: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, 
data/mllib/sample_libsvm_data.txt)
I believe that I have formatted my data as per the required Libsvm format. Here 
is a snippet of that:
1122:11693:11771:11974:12334:1
2378:12562:1 1118:11389:11413:11454:1   
 1780:12562:15051:15417:15548:1
5798:15862:1 0150:1214:1468:11013:1 
   1078:11092:11117:11489:11546:11630:1 
   1635:11827:12024:12215:12478:1
2761:15985:16115:16218:1 0251:15578:1 
However,When I use MLUtils.loadLibSVMFile(sc, path-to-data-file)I get the 
following error messages in mt spark-shell. Can someone please point me in 
right direction.
java.lang.NumberFormatException: For input string: 150:1214:1
468:11013:11078:11092:11117:11489:1 
   1546:11630:11635:11827:12024:12215:1 
   2478:12761:15985:16115:16218:1 
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241) 
at java.lang.Double.parseDouble(Double.java:540) at 
scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:232)  
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: [mllib] State of Multi-Model training

2014-09-16 Thread Burak Yavuz
Hi Kyle,

I'm actively working on it now. It's pretty close to completion, I'm just 
trying to figure out bottlenecks and optimize as much as possible.
As Phase 1, I implemented multi model training on Gradient Descent. Instead of 
performing Vector-Vector operations on rows (examples) and weights,
I've batched them into matrices so that we can use Level 3 BLAS to speed things 
up. I've also added support for Sparse Matrices 
(https://github.com/apache/spark/pull/2294) as making use of sparsity will 
allow you to train more models at once.

Best,
Burak

- Original Message -
From: Kyle Ellrott kellr...@soe.ucsc.edu
To: dev@spark.apache.org
Sent: Tuesday, September 16, 2014 3:21:53 PM
Subject: [mllib] State of Multi-Model training

I'm curious about the state of development Multi-Model learning in MLlib
(training sets of models during the same training session, rather then one
at a time). The JIRA lists it as in progress targeting Spark 1.2.0 (
https://issues.apache.org/jira/browse/SPARK-1486 ). But there hasn't been
any notes on it in over a month.
I submitted a pull request for a possible method to do this work a little
over two months ago (https://github.com/apache/spark/pull/1292), but
haven't yet received any feedback on the patch yet.
Is anybody else working on multi-model training?

Kyle


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark SQL

2014-09-14 Thread Burak Yavuz
Hi,

I'm not a master on SparkSQL, but from what I understand, the problem ıs that 
you're trying to access an RDD
inside an RDD here: val xyz = file.map(line = *** 
extractCurRate(sqlContext.sql(select rate ... *** and 
here:  xyz = file.map(line = *** extractCurRate(sqlContext.sql(select rate 
... ***.
RDDs can't be serialized inside other RDD tasks, therefore you're receiving the 
NullPointerException.

More specifically, you are trying to generate a SchemaRDD inside an RDD, which 
you can't do.

If file isn't huge, you can call .collect() to transform the RDD to an array 
and then use .map() on the Array.

If the file is huge, then you may do number 3 first, join the two RDDs using 
'txCurCode' as a key, and then do filtering
operations, etc...

Best,
Burak

- Original Message -
From: rkishore999 rkishore...@yahoo.com
To: u...@spark.incubator.apache.org
Sent: Saturday, September 13, 2014 10:29:26 PM
Subject: Spark SQL

val file =
sc.textFile(hdfs://ec2-54-164-243-97.compute-1.amazonaws.com:9010/user/fin/events.txt)

1. val xyz = file.map(line = extractCurRate(sqlContext.sql(select rate
from CurrencyCodeRates where txCurCode = ' + line.substring(202,205) + '
and fxCurCode = ' + fxCurCodesMap(line.substring(77,82)) + ' and
effectiveDate = ' + line.substring(221,229) + ' order by effectiveDate
desc))

2. val xyz = file.map(line = sqlContext.sql(select rate, txCurCode,
fxCurCode, effectiveDate from CurrencyCodeRates where txCurCode = 'USD' and
fxCurCode = 'CSD' and effectiveDate = '20140901' order by effectiveDate
desc))

3. val xyz = sqlContext.sql(select rate, txCurCode, fxCurCode,
effectiveDate from CurrencyCodeRates where txCurCode = 'USD' and fxCurCode =
'CSD' and effectiveDate = '20140901' order by effectiveDate desc)

xyz.saveAsTextFile(/user/output)

In statements 1 and 2 I'm getting nullpointer expecption. But statement 3 is
good. I'm guessing spark context and sql context are not going together
well.

Any suggestions regarding how I can achieve this?






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-tp14183.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Filter function problem

2014-09-09 Thread Burak Yavuz
Hi,

val test = persons.value
  .map{tuple = (tuple._1, tuple._2
  .filter{event = *inactiveIDs.filter(event2 = event2._1 == 
tuple._1).count() != 0})}

Your problem is right between the asterisk. You can't make an RDD operation 
inside an RDD operation, because RDD's can't be serialized. 
Therefore you are receiving the NullPointerException. Try joining the RDDs 
based on `event` and then filter based on that.

Best,
Burak

- Original Message -
From: Blackeye black...@iit.demokritos.gr
To: u...@spark.incubator.apache.org
Sent: Tuesday, September 9, 2014 3:34:58 AM
Subject: Re: Filter function problem

In order to help anyone to answer i could say that i checked the
inactiveIDs.filter operation seperated, and I found that it doesn't return
null in any case. In addition i don't how to handle (or check) whether a RDD
is null. I find the debugging to complicated to point the error. Any ideas
how to find the null pointer? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Filter-function-problem-tp13787p13789.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[Phpmyadmin-git] [phpmyadmin/localized_docs] 6d551e: Translated using Weblate (Turkish)

2014-09-07 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/localized_docs
  Commit: 6d551e2fce7ea6e02e1194acc6a800a1af836b5b
  
https://github.com/phpmyadmin/localized_docs/commit/6d551e2fce7ea6e02e1194acc6a800a1af836b5b
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-09-07 (Sun, 07 Sep 2014)

  Changed paths:
M po/tr.mo
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 99.8% (1651 of 1653)

[ci skip]


--
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[jira] [Created] (SPARK-3418) Additional BLAS and Local Sparse Matrix support

2014-09-05 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-3418:
--

 Summary: Additional BLAS and Local Sparse Matrix support
 Key: SPARK-3418
 URL: https://issues.apache.org/jira/browse/SPARK-3418
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Burak Yavuz


Currently MLlib doesn't have Level-2 and Level-3 BLAS support. For Multi-Model 
training, adding support for Level-3 BLAS functions is vital.

In addition, as most real data is sparse, support for Local Sparse Matrices 
will also be added, as supporting sparse matrices will save a lot of memory and 
will lead to better performance. The ability to left multiply a dense matrix 
with a sparse matrix, i.e. `C := alpha * A * B + beta * C` where `A` is a 
sparse matrix will also be added. However, `B` and `C` will remain as Dense 
Matrices for now.

I will post performance comparisons with other libraries that support sparse 
matrices such as Breeze and Matrix-toolkits-JAVA (MTJ) in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3418) [MLlib] Additional BLAS and Local Sparse Matrix support

2014-09-05 Thread Burak Yavuz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz updated SPARK-3418:
---
Summary: [MLlib] Additional BLAS and Local Sparse Matrix support  (was: 
Additional BLAS and Local Sparse Matrix support)

 [MLlib] Additional BLAS and Local Sparse Matrix support
 ---

 Key: SPARK-3418
 URL: https://issues.apache.org/jira/browse/SPARK-3418
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Burak Yavuz

 Currently MLlib doesn't have Level-2 and Level-3 BLAS support. For 
 Multi-Model training, adding support for Level-3 BLAS functions is vital.
 In addition, as most real data is sparse, support for Local Sparse Matrices 
 will also be added, as supporting sparse matrices will save a lot of memory 
 and will lead to better performance. The ability to left multiply a dense 
 matrix with a sparse matrix, i.e. `C := alpha * A * B + beta * C` where `A` 
 is a sparse matrix will also be added. However, `B` and `C` will remain as 
 Dense Matrices for now.
 I will post performance comparisons with other libraries that support sparse 
 matrices such as Breeze and Matrix-toolkits-JAVA (MTJ) in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[Phpmyadmin-git] [phpmyadmin/localized_docs] 1e0179: Translated using Weblate (Turkish)

2014-08-30 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/localized_docs
  Commit: 1e0179a5b88ed87450de23a73fac265a686d0476
  
https://github.com/phpmyadmin/localized_docs/commit/1e0179a5b88ed87450de23a73fac265a686d0476
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-08-30 (Sat, 30 Aug 2014)

  Changed paths:
M po/tr.mo
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 99.8% (1650 of 1653)

[ci skip]


--
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[jira] [Updated] (SPARK-3280) Made sort-based shuffle the default implementation

2014-08-28 Thread Burak Yavuz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz updated SPARK-3280:
---

Attachment: hash-sort-comp.png

 Made sort-based shuffle the default implementation
 --

 Key: SPARK-3280
 URL: https://issues.apache.org/jira/browse/SPARK-3280
 Project: Spark
  Issue Type: Improvement
Reporter: Reynold Xin
Assignee: Reynold Xin
 Attachments: hash-sort-comp.png


 sort-based shuffle has lower memory usage and seems to outperform hash-based 
 in almost all of our testing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3280) Made sort-based shuffle the default implementation

2014-08-28 Thread Burak Yavuz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114873#comment-14114873
 ] 

Burak Yavuz commented on SPARK-3280:


I don't have as detailed a comparison like Josh has, but for MLlib algorithms, 
sort based shuffle didn't show the performance boosts Josh has shown. 16 
m3.2xlarge instances were used for these experiments. The difference here is 
that the number of partitions I used were 128. Much less than the number of 
partitions Josh has shown.

!hash-sort-comp.png!

 Made sort-based shuffle the default implementation
 --

 Key: SPARK-3280
 URL: https://issues.apache.org/jira/browse/SPARK-3280
 Project: Spark
  Issue Type: Improvement
Reporter: Reynold Xin
Assignee: Reynold Xin
 Attachments: hash-sort-comp.png


 sort-based shuffle has lower memory usage and seems to outperform hash-based 
 in almost all of our testing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-28 Thread Burak Yavuz
+1. Tested MLlib algorithms on Amazon EC2, algorithms show speed-ups between 
1.5-5x compared to the 1.0.2 release.

- Original Message -
From: Patrick Wendell pwend...@gmail.com
To: dev@spark.apache.org
Sent: Thursday, August 28, 2014 8:32:11 PM
Subject: Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

I'll kick off the vote with a +1.

On Thu, Aug 28, 2014 at 7:14 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.1.0!

 The tag to be voted on is v1.1.0-rc2 (commit 711aebb3):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=711aebb329ca28046396af1e34395a0df92b5327

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.1.0-rc2/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1029/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.1.0-rc2-docs/

 Please vote on releasing this package as Apache Spark 1.1.0!

 The vote is open until Monday, September 01, at 03:11 UTC and passes if
 a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.1.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == Regressions fixed since RC1 ==
 LZ4 compression issue: https://issues.apache.org/jira/browse/SPARK-3277

 == What justifies a -1 vote for this release? ==
 This vote is happening very late into the QA period compared with
 previous votes, so -1 votes should only occur for significant
 regressions from 1.0.2. Bugs already present in 1.0.X will not block
 this release.

 == What default changes should I be aware of? ==
 1. The default value of spark.io.compression.codec is now snappy
 -- Old behavior can be restored by switching to lzf

 2. PySpark now performs external spilling during aggregations.
 -- Old behavior can be restored by setting spark.shuffle.spill to false.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: OutofMemoryError when generating output

2014-08-28 Thread Burak Yavuz
Yeah, saveAsTextFile is an RDD specific method. If you really want to use that 
method, just turn the map into an RDD:

`sc.parallelize(x.toSeq).saveAsTextFile(...)`

Reading through the api-docs will present you many more alternate solutions!

Best,
Burak

- Original Message -
From: SK skrishna...@gmail.com
To: u...@spark.incubator.apache.org
Sent: Thursday, August 28, 2014 12:45:22 PM
Subject: Re: OutofMemoryError when generating output

Hi,
Thanks for the response. I tried to use countByKey. But I am not able to
write the output to console or to a file. Neither collect() nor
saveAsTextFile() work for the Map object that is generated after
countByKey(). 

valx = sc.textFile(baseFile)).map { line =
val fields = line.split(\t)
   (fields(11), fields(6)) // extract (month, user_id)
  }.distinct().countByKey()

x.saveAsTextFile(...)  // does not work. generates an error that
saveAstextFile is not defined for Map object


Is there a way to convert the Map object to an object that I can output to
console and to a file?

thanks




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/OutofMemoryError-when-generating-output-tp12847p13056.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Memory statistics in the Application detail UI

2014-08-28 Thread Burak Yavuz
Hi,

Spark uses by default approximately 60% of the executor heap memory to store 
RDDs. That's why you have 8.6GB instead of 16GB. 95.5 is therefore the sum of 
all the 8.6 GB of executor memory + the driver memory.

Best,
Burak

- Original Message -
From: SK skrishna...@gmail.com
To: u...@spark.incubator.apache.org
Sent: Thursday, August 28, 2014 6:32:32 PM
Subject: Memory statistics in the Application detail UI

Hi,

I am using a cluster where each node has 16GB (this is the executor memory).
After I complete an MLlib job, the executor tab shows the following:

Memory: 142.6 KB Used (95.5 GB Total) 

and individual worker nodes have the Memory Used values as 17.3 KB / 8.6 GB 
(this is different for different nodes). What does the second number signify
(i.e.  8.6 GB and 95.5 GB)? If 17.3 KB was used out of the total memory of
the node, should it not be 17.3 KB/16 GB?

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Memory-statistics-in-the-Application-detail-UI-tp13082.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[Phpmyadmin-git] [phpmyadmin/phpmyadmin] d0e0ed: Translated using Weblate (Turkish)

2014-08-27 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: d0e0ed047816fa84ce213df88be75670c765eeb5
  
https://github.com/phpmyadmin/phpmyadmin/commit/d0e0ed047816fa84ce213df88be75670c765eeb5
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-08-27 (Wed, 27 Aug 2014)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2964 of 2964)

[ci skip]


--
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


Re: Amplab: big-data-benchmark

2014-08-27 Thread Burak Yavuz
Hi Sameer,

I've faced this issue before. They don't show up on 
http://s3.amazonaws.com/big-data-benchmark/. But you can directly use: 
`sc.textFile(s3n://big-data-benchmark/pavlo/text/tiny/crawl)`
The gotcha is that you also need to supply which dataset you want: crawl, 
uservisits, or rankings in lower case after the format and size you want them 
in.
They should be there.

Best,
Burak

- Original Message -
From: Sameer Tilak ssti...@live.com
To: user@spark.apache.org
Sent: Wednesday, August 27, 2014 11:42:28 AM
Subject: Amplab: big-data-benchmark

Hi All,
I am planning to run amplab benchmark suite to evaluate the performance of our 
cluster. I looked at: https://amplab.cs.berkeley.edu/benchmark/ and it mentions 
about data avallability at:
s3n://big-data-benchmark/pavlo/[text|text-deflate|sequence|sequence-snappy]/[suffix]where
 /tiny/, /1node/ and /5nodes/ are options for suffix. However, I am not able to 
doanload these datasets directly. Here is what I see. I read that they can be 
used directly by doing : sc.textFile(s3:/). However, I wanted to make sure 
that my understanding is correct. Here is what I see at 
http://s3.amazonaws.com/big-data-benchmark/
I do not see anything for sequence or text-deflate.
I see sequence-snappy dataset:
ContentsKeypavlo/sequence-snappy/5nodes/crawl/000738_0/KeyLastModified2013-05-27T21:26:40.000Z/LastModifiedETaga978d18721d5a533d38a88f558461644/ETagSize42958735/SizeStorageClassSTANDARD/StorageClass/Contents
For text, I get the following error:
ErrorCodeNoSuchKey/CodeMessageThe specified key does not 
exist./MessageKeypavlo/text/1node/crawl/KeyRequestId166D239D38399526/RequestIdHostId4Bg8BHomWqJ6BXOkx/3fQZhN5Uw1TtCn01uQzm+1qYffx2s/oPV+9sGoAWV2thCI/HostId/Error

Please let me know if there is a way to readily download the dataset and view 
it. 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: OutofMemoryError when generating output

2014-08-26 Thread Burak Yavuz
Hi,

The error doesn't occur during saveAsTextFile but rather during the groupByKey 
as far as I can tell. We strongly urge users to not use groupByKey
if they don't have to. What I would suggest is the following work-around:
sc.textFile(baseFile)).map { line = 
  val fields = line.split(\t)
  (fields(11), fields(6)) // extract (month, user_id)
}.distinct().countByKey()

instead

Best,
Burak


- Original Message -
From: SK skrishna...@gmail.com
To: u...@spark.incubator.apache.org
Sent: Tuesday, August 26, 2014 12:38:00 PM
Subject: OutofMemoryError when generating output

Hi,

I have the following piece of code that I am running on a cluster with 10
nodes with 2GB memory per node. The tasks seem to complete, but at the point
where it is generating output (saveAsTextFile), the program freezes after
some time and reports an out of memory error (error transcript attached
below). I also tried using collect() and printing the output to console
instead of a file, but got the same error. The program reads some logs for a
month and extracts the number of unique users during the month. The reduced
output is not very large, so not sure why the memory error occurs. I would
appreciate any help in fixing this memory error to get the output. Thanks.

def main (args: Array[String]) {

val conf = new SparkConf().setAppName(App)
val sc = new SparkContext(conf)

 // get the number of users per month
val user_time = sc.union(sc.textFile(baseFile))
   .map(line = {
 val fields = line.split(\t)
(fields(11), fields(6))
}) // extract (month, user_id)
  .groupByKey  // group by month as the key
  .map(g= (g._1, g._2.toSet.size)) // get the
unique id count per month
 //  .collect()
 // user_time.foreach(f =
println(f))
 user_time.map(f = %s, %s.format(f._1,
f._2)).saveAsTextFile(app_output)
 sc.stop()
   }






14/08/26 15:21:15 WARN TaskSetManager: Loss was due to
java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.spark.util.collection.ExternalAppendOnlyMap.insert(ExternalAppendOnlyMap.scala:121)
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:60)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$4.apply(PairRDDFunctions.scala:107)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$4.apply(PairRDDFunctions.scala:106)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at 
org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/OutofMemoryError-when-generating-output-tp12847.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: saveAsTextFile hangs with hdfs

2014-08-26 Thread Burak Yavuz
Hi David, 

Your job is probably hanging on the groupByKey process. Probably GC is kicking 
in and the process starts to hang or the data is unbalanced and you end up with 
stragglers (Once GC kicks in you'll start to get the connection errors you 
shared). If you don't care about the list of values itself, but the count of it 
(that appears to be what you're trying to save, correct me if I'm wrong), then 
I would suggest using `countByKey()` directly on 
`JavaPairRDDString, AnalyticsLogFlyweight partitioned`. 

Best, 
Burak 

- Original Message -

From: David david.b...@gmail.com 
To: user u...@spark.incubator.apache.org 
Sent: Tuesday, August 19, 2014 1:44:18 PM 
Subject: saveAsTextFile hangs with hdfs 

I have a simple spark job that seems to hang when saving to hdfs. When looking 
at the spark web ui, the job reached 97 of 100 tasks completed. I need some 
help determining why the job appears to hang. The job hangs on the 
saveAsTextFile() call. 


https://www.dropbox.com/s/fdp7ck91hhm9w68/Screenshot%202014-08-19%2010.53.24.png
 

The job is pretty simple: 

JavaRDDString analyticsLogs = context 
.textFile(Joiner.on(,).join(hdfs.glob(/spark-dfs, .*\\.log$)), 4); 

JavaRDDAnalyticsLogFlyweight flyweights = analyticsLogs 
.map(line - { 
try { 
AnalyticsLog log = GSON.fromJson(line, AnalyticsLog.class); 
AnalyticsLogFlyweight flyweight = new AnalyticsLogFlyweight(); 
flyweight.ipAddress = log.getIpAddress(); 
flyweight.time = log.getTime(); 
flyweight.trackingId = log.getTrackingId(); 
return flyweight; 

} catch (Exception e) { 
LOG.error(error parsing json, e); 
return null; 
} 
}); 


JavaRDDAnalyticsLogFlyweight filtered = flyweights 
.filter(log - log != null); 

JavaPairRDDString, AnalyticsLogFlyweight partitioned = filtered 
.mapToPair((AnalyticsLogFlyweight log) - new Tuple2(log.trackingId, log)) 
.partitionBy(new HashPartitioner(100)).cache(); 


OrderingAnalyticsLogFlyweight ordering = 
Ordering.natural().nullsFirst().onResultOf(new FunctionAnalyticsLogFlyweight, 
Long() { 
public Long apply(AnalyticsLogFlyweight log) { 
return log.time; 
} 
}); 

JavaPairRDDString, IterableAnalyticsLogFlyweight stringIterableJavaPairRDD 
= partitioned.groupByKey(); 
JavaPairRDDString, Integer stringIntegerJavaPairRDD = 
stringIterableJavaPairRDD.mapToPair((log) - { 
ListAnalyticsLogFlyweight sorted = Lists.newArrayList(log._2()); 
sorted.forEach(l - LOG.info(sorted {}, l)); 
return new Tuple2(log._1(), sorted.size()); 
}); 

String outputPath = /summarized/groupedByTrackingId4; 
hdfs.rm(outputPath, true); 
stringIntegerJavaPairRDD.saveAsTextFile(String.format(%s/%s, hdfs.getUrl(), 
outputPath)); 


Thanks in advance, David 



[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 9cedf0: Translated using Weblate (Turkish)

2014-08-25 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: 9cedf0a58feaad7604cdf7d09828854c11c630e6
  
https://github.com/phpmyadmin/phpmyadmin/commit/9cedf0a58feaad7604cdf7d09828854c11c630e6
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-08-25 (Mon, 25 Aug 2014)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2962 of 2962)

[ci skip]


--
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


Re: Finding Rank in Spark

2014-08-23 Thread Burak Yavuz
Spearman's Correlation requires the calculation of ranks for columns. You can 
checkout the code here and slice the part you need!

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/SpearmanCorrelation.scala

Best,
Burak

- Original Message -
From: athiradas athira@flutura.com
To: u...@spark.incubator.apache.org
Sent: Friday, August 22, 2014 4:14:34 AM
Subject: Re: Finding Rank in Spark

Does anyone knw a way to do this?

I tried it by sorting it and writing an auto increment function.

But since its parallel computing the result is wrong.

Is there anyway? please reply



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Finding-Rank-in-Spark-tp12028p12647.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: LDA example?

2014-08-22 Thread Burak Yavuz
You can check out this pull request: https://github.com/apache/spark/pull/476

LDA is on the roadmap for the 1.2 release, hopefully we will officially support 
it then!

Best,
Burak

- Original Message -
From: Denny Lee denny.g@gmail.com
To: user@spark.apache.org
Sent: Thursday, August 21, 2014 10:10:35 PM
Subject: LDA example?

Quick question - is there a handy sample / example of how to use the LDA 
algorithm within Spark MLLib?  

Thanks!
Denny



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[Phpmyadmin-git] [phpmyadmin/phpmyadmin] b574ff: Translated using Weblate (Romanian)

2014-08-16 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: b574fff77c983dd103bc53ae1751030cbb7c17c1
  
https://github.com/phpmyadmin/phpmyadmin/commit/b574fff77c983dd103bc53ae1751030cbb7c17c1
  Author: Alecsandru Prună alecsandrucpr...@gmail.com
  Date:   2014-08-16 (Sat, 16 Aug 2014)

  Changed paths:
M po/ro.po

  Log Message:
  ---
  Translated using Weblate (Romanian)

Currently translated at 57.7% (1713 of 2964)

[ci skip]


  Commit: 75a7f0f9791bd737b4c6619879b82956ae1e3bfe
  
https://github.com/phpmyadmin/phpmyadmin/commit/75a7f0f9791bd737b4c6619879b82956ae1e3bfe
  Author: Weblate nore...@weblate.org
  Date:   2014-08-16 (Sat, 16 Aug 2014)

  Changed paths:
M test/libraries/PMA_SetupIndex_test.php

  Log Message:
  ---
  Merge remote-tracking branch 'origin/master'


  Commit: 21a01002926cd479b2e2592b4fbea827509fed14
  
https://github.com/phpmyadmin/phpmyadmin/commit/21a01002926cd479b2e2592b4fbea827509fed14
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-08-16 (Sat, 16 Aug 2014)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2964 of 2964)

[ci skip]


Compare: 
https://github.com/phpmyadmin/phpmyadmin/compare/5c7c5ba5337d...21a01002926c--
___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[jira] [Created] (SPARK-3080) ArrayIndexOutOfBoundsException in ALS for Large datasets

2014-08-15 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-3080:
--

 Summary: ArrayIndexOutOfBoundsException in ALS for Large datasets
 Key: SPARK-3080
 URL: https://issues.apache.org/jira/browse/SPARK-3080
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Reporter: Burak Yavuz


The stack trace is below:

```
java.lang.ArrayIndexOutOfBoundsException: 2716

org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateBlock$1.apply$mcVI$sp(ALS.scala:543)
scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)

org.apache.spark.mllib.recommendation.ALS.org$apache$spark$mllib$recommendation$ALS$$updateBlock(ALS.scala:537)

org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:505)

org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:504)

org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)

org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:138)

org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)

org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158)

scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)

scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
```
This happened after the dataset was sub-sampled. 
Dataset properties: ~12B ratings




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3080) ArrayIndexOutOfBoundsException in ALS for Large datasets

2014-08-15 Thread Burak Yavuz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz updated SPARK-3080:
---

Description: 
The stack trace is below:

{quote}
java.lang.ArrayIndexOutOfBoundsException: 2716

org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateBlock$1.apply$mcVI$sp(ALS.scala:543)
scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)

org.apache.spark.mllib.recommendation.ALS.org$apache$spark$mllib$recommendation$ALS$$updateBlock(ALS.scala:537)

org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:505)

org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:504)

org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)

org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:138)

org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)

org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158)

scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)

scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
{quote}
This happened after the dataset was sub-sampled. 
Dataset properties: ~12B ratings
Setup: 55 r3.8xlarge ec2 instances

  was:
The stack trace is below:

{quote}
java.lang.ArrayIndexOutOfBoundsException: 2716

org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateBlock$1.apply$mcVI$sp(ALS.scala:543)
scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)

org.apache.spark.mllib.recommendation.ALS.org$apache$spark$mllib$recommendation$ALS$$updateBlock(ALS.scala:537)

org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:505)

org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:504)

org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)

org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:138)

org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)

org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158)

scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)

scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229

[jira] [Updated] (SPARK-3080) ArrayIndexOutOfBoundsException in ALS for Large datasets

2014-08-15 Thread Burak Yavuz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz updated SPARK-3080:
---

Description: 
The stack trace is below:

{quote}
java.lang.ArrayIndexOutOfBoundsException: 2716

org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateBlock$1.apply$mcVI$sp(ALS.scala:543)
scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)

org.apache.spark.mllib.recommendation.ALS.org$apache$spark$mllib$recommendation$ALS$$updateBlock(ALS.scala:537)

org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:505)

org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:504)

org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)

org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:138)

org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)

org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158)

scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)

scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
{quote}
This happened after the dataset was sub-sampled. 
Dataset properties: ~12B ratings


  was:
The stack trace is below:

```
java.lang.ArrayIndexOutOfBoundsException: 2716

org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateBlock$1.apply$mcVI$sp(ALS.scala:543)
scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)

org.apache.spark.mllib.recommendation.ALS.org$apache$spark$mllib$recommendation$ALS$$updateBlock(ALS.scala:537)

org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:505)

org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:504)

org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)

org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:138)

org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)

org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158)

scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)

scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31

[Phpmyadmin-git] [phpmyadmin/phpmyadmin] 2b61fb: Translated using Weblate (Turkish)

2014-08-13 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: 2b61fb1281e094a885f580f83d8381c7cca8bb04
  
https://github.com/phpmyadmin/phpmyadmin/commit/2b61fb1281e094a885f580f83d8381c7cca8bb04
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-08-13 (Wed, 13 Aug 2014)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2961 of 2961)

[ci skip]


--
___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


[jira] [Resolved] (SPARK-2833) performance tests for linear regression

2014-08-12 Thread Burak Yavuz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz resolved SPARK-2833.


Resolution: Fixed

 performance tests for linear regression
 ---

 Key: SPARK-2833
 URL: https://issues.apache.org/jira/browse/SPARK-2833
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib
Reporter: Xiangrui Meng
Assignee: Burak Yavuz

 linear regression, lasso, and ridge



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2837) performance tests for ALS

2014-08-12 Thread Burak Yavuz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz resolved SPARK-2837.


Resolution: Done

 performance tests for ALS
 -

 Key: SPARK-2837
 URL: https://issues.apache.org/jira/browse/SPARK-2837
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib
Reporter: Xiangrui Meng
Assignee: Burak Yavuz

 ALS (explicit/implicit)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-2836) performance tests for k-means

2014-08-12 Thread Burak Yavuz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz closed SPARK-2836.
--

Resolution: Fixed

 performance tests for k-means
 -

 Key: SPARK-2836
 URL: https://issues.apache.org/jira/browse/SPARK-2836
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib
Reporter: Xiangrui Meng
Assignee: Burak Yavuz





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2834) performance tests for linear algebra functions

2014-08-12 Thread Burak Yavuz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz resolved SPARK-2834.


Resolution: Fixed

 performance tests for linear algebra functions
 --

 Key: SPARK-2834
 URL: https://issues.apache.org/jira/browse/SPARK-2834
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib
Reporter: Xiangrui Meng
Assignee: Burak Yavuz

 SVD and PCA



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2829) Implement MLlib performance tests in spark-perf

2014-08-12 Thread Burak Yavuz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz resolved SPARK-2829.


Resolution: Fixed

 Implement MLlib performance tests in spark-perf
 ---

 Key: SPARK-2829
 URL: https://issues.apache.org/jira/browse/SPARK-2829
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Xiangrui Meng
Assignee: Burak Yavuz

 We don't have performance tests for MLlib in spark-perf: 
 https://github.com/databricks/spark-perf
 So it is hard to catch regression problems automatically. This is an umbrella 
 JIRA for implementing performance tests for MLlib's algorithms.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2831) performance tests for linear classification methods

2014-08-12 Thread Burak Yavuz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz resolved SPARK-2831.


Resolution: Fixed

 performance tests for linear classification methods
 ---

 Key: SPARK-2831
 URL: https://issues.apache.org/jira/browse/SPARK-2831
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib
Reporter: Xiangrui Meng
Assignee: Burak Yavuz

 1. logistic regression
 2. linear svm
 3. with LBFGS
 4. naive bayes



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [MLLib]:choosing the Loss function

2014-08-11 Thread Burak Yavuz
Hi,

// Initialize the optimizer using logistic regression as the loss function with 
L2 regularization
val lbfgs = new LBFGS(new LogisticGradient(), new SquaredL2Updater())

// Set the hyperparameters
lbfgs.setMaxNumIterations(numIterations).setRegParam(regParam).setConvergenceTol(tol).setNumCorrections(numCor)

// Retrieve the weights
val weightsWithIntercept = lbfgs.optimize(data, initialWeights)

//Slice weights with intercept into weight and intercept

//Initialize Logistic Regression Model
val model = new LogisticRegressionModel(weights, intercept)

model.predict(test) //Make your predictions

The example code doesn't generate the Logistic Regression Model that you can 
make predictions with.

`LBFGS.runMiniBatchLBFGS` outputs a tuple of (weights, lossHistory). The 
example code was for a benchmark, so it was interested more
in the loss history than the model itself.

You can also run
`val (weightsWithIntercept, localLoss) = LBFGS.runMiniBatchLBFGS ...`

slice `weightsWithIntercept` into the intercept and the rest of the weights and 
instantiate the model again as:
val model = new LogisticRegressionModel(weights, intercept)


Burak



- Original Message -
From: SK skrishna...@gmail.com
To: u...@spark.incubator.apache.org
Sent: Monday, August 11, 2014 11:52:04 AM
Subject: Re: [MLLib]:choosing the Loss function

Hi,

Thanks for the reference to the LBFGS optimizer. 
I tried to use the LBFGS optimizer, but I am not able to pass it  as an
input to the LogisticRegression model for binary classification. After
studying the code in mllib/classification/LogisticRegression.scala, it
appears that the  only implementation of LogisticRegression uses
GradientDescent as a fixed optimizer. In other words, I dont see a
setOptimizer() function that I can use to change the optimizer to LBFGS.

I tried to follow the code in
https://github.com/dbtsai/spark-lbfgs-benchmark/blob/master/src/main/scala/org/apache/spark/mllib/benchmark/BinaryLogisticRegression.scala
that makes use of LBFGS, but it is not clear to me where  the
LogisticRegression  model with LBFGS is being returned that I can use for
the classification of the test dataset. 

If some one has sample code that uses LogisticRegression with LBFGS instead
of gradientDescent as the optimization algorithm, it would be helpful if you
can post it.

thanks 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-choosing-the-Loss-function-tp11738p11913.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[jira] [Commented] (SPARK-2916) [MLlib] While running regression tests with dense vectors of length greater than 1000, the treeAggregate blows up after several iterations

2014-08-08 Thread Burak Yavuz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090498#comment-14090498
 ] 

Burak Yavuz commented on SPARK-2916:


will do

 [MLlib] While running regression tests with dense vectors of length greater 
 than 1000, the treeAggregate blows up after several iterations
 --

 Key: SPARK-2916
 URL: https://issues.apache.org/jira/browse/SPARK-2916
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Reporter: Burak Yavuz
Priority: Blocker

 While running any of the regression algorithms with gradient descent, the 
 treeAggregate blows up after several iterations.
 Observed on EC2 cluster with 16 nodes, matrix dimensions of 1,000,000 x 5,000



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2916) [MLlib] While running regression tests with dense vectors of length greater than 1000, the treeAggregate blows up after several iterations

2014-08-08 Thread Burak Yavuz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz updated SPARK-2916:
---

Description: 
While running any of the regression algorithms with gradient descent, the 
treeAggregate blows up after several iterations.

Observed on EC2 cluster with 16 nodes, matrix dimensions of 1,000,000 x 5,000

In order to replicate the problem, use aggregate multiple times, maybe over 
50-60 times.

  was:
While running any of the regression algorithms with gradient descent, the 
treeAggregate blows up after several iterations.

Observed on EC2 cluster with 16 nodes, matrix dimensions of 1,000,000 x 5,000


 [MLlib] While running regression tests with dense vectors of length greater 
 than 1000, the treeAggregate blows up after several iterations
 --

 Key: SPARK-2916
 URL: https://issues.apache.org/jira/browse/SPARK-2916
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Reporter: Burak Yavuz
Priority: Blocker

 While running any of the regression algorithms with gradient descent, the 
 treeAggregate blows up after several iterations.
 Observed on EC2 cluster with 16 nodes, matrix dimensions of 1,000,000 x 5,000
 In order to replicate the problem, use aggregate multiple times, maybe over 
 50-60 times.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2916) [MLlib] While running regression tests with dense vectors of length greater than 1000, the treeAggregate blows up after several iterations

2014-08-08 Thread Burak Yavuz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz updated SPARK-2916:
---

Component/s: Spark Core

 [MLlib] While running regression tests with dense vectors of length greater 
 than 1000, the treeAggregate blows up after several iterations
 --

 Key: SPARK-2916
 URL: https://issues.apache.org/jira/browse/SPARK-2916
 Project: Spark
  Issue Type: Bug
  Components: MLlib, Spark Core
Reporter: Burak Yavuz
Priority: Blocker

 While running any of the regression algorithms with gradient descent, the 
 treeAggregate blows up after several iterations.
 Observed on EC2 cluster with 16 nodes, matrix dimensions of 1,000,000 x 5,000
 In order to replicate the problem, use aggregate multiple times, maybe over 
 50-60 times.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2916) [MLlib] While running regression tests with dense vectors of length greater than 1000, the treeAggregate blows up after several iterations

2014-08-07 Thread Burak Yavuz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz updated SPARK-2916:
---

Summary: [MLlib] While running regression tests with dense vectors of 
length greater than 1000, the treeAggregate blows up after several iterations  
(was: While running regression tests with dense vectors of length greater than 
1000, the treeAggregate blows up after several iterations)

 [MLlib] While running regression tests with dense vectors of length greater 
 than 1000, the treeAggregate blows up after several iterations
 --

 Key: SPARK-2916
 URL: https://issues.apache.org/jira/browse/SPARK-2916
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Reporter: Burak Yavuz
Priority: Blocker





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-2916) While running regression tests with dense vectors of length greater than 1000, the treeAggregate blows up after several iterations

2014-08-07 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-2916:
--

 Summary: While running regression tests with dense vectors of 
length greater than 1000, the treeAggregate blows up after several iterations
 Key: SPARK-2916
 URL: https://issues.apache.org/jira/browse/SPARK-2916
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Reporter: Burak Yavuz
Priority: Blocker






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2916) [MLlib] While running regression tests with dense vectors of length greater than 1000, the treeAggregate blows up after several iterations

2014-08-07 Thread Burak Yavuz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz updated SPARK-2916:
---

Description: 
While running any of the regression algorithms with gradient descent, the 
treeAggregate blows up after several iterations.

Observed on EC2 cluster with 16 nodes, matrix dimensions of 1,000,000 x 5,000

 [MLlib] While running regression tests with dense vectors of length greater 
 than 1000, the treeAggregate blows up after several iterations
 --

 Key: SPARK-2916
 URL: https://issues.apache.org/jira/browse/SPARK-2916
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Reporter: Burak Yavuz
Priority: Blocker

 While running any of the regression algorithms with gradient descent, the 
 treeAggregate blows up after several iterations.
 Observed on EC2 cluster with 16 nodes, matrix dimensions of 1,000,000 x 5,000



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: KMeans Input Format

2014-08-07 Thread Burak Yavuz
Hi,

Could you try running spark-shell with the flag --driver-memory 2g or more if 
you have more RAM available and try again?

Thanks,
Burak

- Original Message -
From: AlexanderRiggers alexander.rigg...@gmail.com
To: u...@spark.incubator.apache.org
Sent: Thursday, August 7, 2014 7:37:40 AM
Subject: KMeans Input Format

I want to perform a K-Means task and fail training the model and get kicked
out of Sparks scala shell before I get my result metrics. I am not sure if
the input format is the problem or something else. I use Spark 1.0.0 and my
input textile (400MB) looks like this:

86252 3711 15.4 4.18 86252 3504 28 1.25 86252 3703 10.75 8.85 86252 3703
10.5 5.55 86252 2201 64 2.79 12262064 7203 32 8.49 12262064 2119 32 1.99
12262064 3405 8.5 2.99 12262064 2119 23 0 12262064 2119 33.8 1.5 12262064
3611 23.7 1.95 etc.

It is ID, Category, PruductSize, PurchaseAMount,. I am not sure if I can use
the first two, because in the MLlib example file there only use floats. So I
also tried the last two:

16 2.49 64 3.29 56 1 16 3.29 6 4.99 10.75 0.79 4.6 3.99 11 1.18 5.8 1.25 15
0.99

My error code in both cases is here:

scala import org.apache.spark.mllib.clustering.KMeans import
org.apache.spark.mllib.clustering.KMeans

scala import org.apache.spark.mllib.linalg.Vectors import
org.apache.spark.mllib.linalg.Vectors

scala

scala // Load and parse the data

scala val data = sc.textFile(data/outkmeanssm.txt) 14/08/07 16:15:37 INFO
MemoryStore: ensureFreeSpace(35456) called with curMem=0, maxMem=318111744
14/08/07 16:15:37 INFO MemoryStore: Block broadcast_0 stored as values to
memory (estimated size 34.6 KB, free 303.3 MB) data:
org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at :14

scala val parsedData = data.map(s = Vectors.dense(s.split('
').map(_.toDouble))) parsedData:
org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] =
MappedRDD[2] at map at :16

scala

scala // Cluster the data into two classes using KMeans

scala val numClusters = 2 numClusters: Int = 2

scala val numIterations = 20 numIterations: Int = 20

scala val clusters = KMeans.train(parsedData, numClusters, numIterations)
14/08/07 16:15:38 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/08/07 16:15:38 WARN LoadSnappy: Snappy native library not loaded 14/08/07
16:15:38 INFO FileInputFormat: Total input paths to process : 1 14/08/07
16:15:38 INFO SparkContext: Starting job: takeSample at KMeans.scala:260
14/08/07 16:15:38 INFO DAGScheduler: Got job 0 (takeSample at
KMeans.scala:260) with 7 output partitions (allowLocal=false) 14/08/07
16:15:38 INFO DAGScheduler: Final stage: Stage 0(takeSample at
KMeans.scala:260) 14/08/07 16:15:38 INFO DAGScheduler: Parents of final
stage: List() 14/08/07 16:15:38 INFO DAGScheduler: Missing parents: List()
14/08/07 16:15:38 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at map
at KMeans.scala:123), which has no missing parents 14/08/07 16:15:39 INFO
DAGScheduler: Submitting 7 missing tasks from Stage 0 (MappedRDD[6] at map
at KMeans.scala:123) 14/08/07 16:15:39 INFO TaskSchedulerImpl: Adding task
set 0.0 with 7 tasks 14/08/07 16:15:39 INFO TaskSetManager: Starting task
0.0:0 as TID 0 on executor localhost: localhost (PROCESS_LOCAL) 14/08/07
16:15:39 INFO TaskSetManager: Serialized task 0.0:0 as 2221 bytes in 3 ms
14/08/07 16:15:39 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on
executor localhost: localhost (PROCESS_LOCAL) 14/08/07 16:15:39 INFO
TaskSetManager: Serialized task 0.0:1 as 2221 bytes in 0 ms 14/08/07
16:15:39 INFO TaskSetManager: Starting task 0.0:2 as TID 2 on executor
localhost: localhost (PROCESS_LOCAL) 14/08/07 16:15:39 INFO TaskSetManager:
Serialized task 0.0:2 as 2221 bytes in 0 ms 14/08/07 16:15:39 INFO
TaskSetManager: Starting task 0.0:3 as TID 3 on executor localhost:
localhost (PROCESS_LOCAL) 14/08/07 16:15:39 INFO TaskSetManager: Serialized
task 0.0:3 as 2221 bytes in 1 ms 14/08/07 16:15:39 INFO TaskSetManager:
Starting task 0.0:4 as TID 4 on executor localhost: localhost
(PROCESS_LOCAL) 14/08/07 16:15:39 INFO TaskSetManager: Serialized task 0.0:4
as 2221 bytes in 0 ms 14/08/07 16:15:39 INFO TaskSetManager: Starting task
0.0:5 as TID 5 on executor localhost: localhost (PROCESS_LOCAL) 14/08/07
16:15:39 INFO TaskSetManager: Serialized task 0.0:5 as 2221 bytes in 0 ms
14/08/07 16:15:39 INFO TaskSetManager: Starting task 0.0:6 as TID 6 on
executor localhost: localhost (PROCESS_LOCAL) 14/08/07 16:15:39 INFO
TaskSetManager: Serialized task 0.0:6 as 2221 bytes in 0 ms 14/08/07
16:15:39 INFO Executor: Running task ID 4 14/08/07 16:15:39 INFO Executor:
Running task ID 1 14/08/07 16:15:39 INFO Executor: Running task ID 5
14/08/07 16:15:39 INFO Executor: Running task ID 6 14/08/07 16:15:39 INFO
Executor: Running task ID 0 14/08/07 16:15:39 INFO Executor: Running task ID
3 14/08/07 16:15:39 INFO Executor: Running task ID 2 14/08/07 16:15:39 INFO
BlockManager: Found block broadcast_0 

Re: questions about MLLib recommendation models

2014-08-07 Thread Burak Yavuz
Hi Jay,

I've had the same problem you've been having in Question 1 with a synthetic 
dataset. I thought I wasn't producing the dataset well enough. This seems to
be a bug. I will open a JIRA for it.

Instead of using:

ratings.map{ case Rating(u,m,r) = {
val pred = model.predict(u, m)
(r - pred)*(r - pred)
  }
}.mean()

you can use something like:

val predictions: RDD[Rating] = model.predict(data.map(x = (x.user, x.product)))
val predictionsAndRatings: RDD[(Double, Double)] = predictions.map{ x =
  def mapPredictedRating(r: Double) = if (implicitPrefs) math.max(math.min(r, 
1.0), 0.0) else r
  ((x.user, x.product), mapPredictedRating(x.rating))
}.join(data.map(x = ((x.user, x.product), x.rating))).values

math.sqrt(predictionsAndRatings.map(x = (x._1 - x._2) * (x._1 - x._2)).mean())

This work around worked for me.

Regarding your question 2, it will be best of you do a special filtering of the 
dataset so that you do train for that user and product.
If we don't have any data trained on a user, there is no way to predict how he 
would like a product.
That filtering takes a lot of work though. I can share some code on that too if 
you like.

Best,
Burak

- Original Message -
From: Jay Hutfles jayhutf...@gmail.com
To: user@spark.apache.org
Sent: Thursday, August 7, 2014 1:06:33 PM
Subject: questions about MLLib recommendation models

I have a few questions regarding a collaborative filtering model, and was
hoping for some recommendations (no pun intended...)

*Setup*

I have a csv file with user/movie/ratings named unimaginatively
'movies.csv'.  Here are the contents:

0,0,5
0,1,5
0,2,0
0,3,0
1,0,5
1,3,0
2,1,4
2,2,0
3,0,0
3,1,0
3,2,5
3,3,4
4,0,0
4,1,0
4,2,5

I then load it into an RDD with a nice command like

val ratings = sc.textFile(movies.csv).map(_.split(',') match { case
Array(u,m,r) = (Rating(u.toInt, m.toInt, r.toDouble))})

So far so good.  I'm even okay building a model for predicting the absent
values in the matrix with

val rank = 10
val iters = 20
val model = ALS.train(ratings, rank, iters)

I can then use the model to predict any user/movie rating without trouble,
like

model.predict(2, 0)

*Question 1: *

If I were to calculate, say, the mean squared error of the training set (or
to my next question, a test set), this doesn't work:

ratings.map{ case Rating(u,m,r) = {
val pred = model.predict(u, m)
(r - pred)*(r - pred)
  }
}.mean()

Actually, any action on RDDs created by mapping over the RDD[Rating] with a
model prediction  fails, like

ratings.map{ case Rating(u, m, _) = model.predict(u, m) }.collect

I get errors due to a scala.MatchError: null.  Here's the exact verbiage:


org.apache.spark.SparkException: Job aborted due to stage failure: Task
26150.0:1 failed 1 times, most recent failure: Exception failure in TID
7091 on host localhost: scala.MatchError: null

org.apache.spark.rdd.PairRDDFunctions.lookup(PairRDDFunctions.scala:571)

org.apache.spark.mllib.recommendation.MatrixFactorizationModel.predict(MatrixFactorizationModel.scala:43)
$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(console:18)
$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(console:18)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
scala.collection.AbstractIterator.to(Iterator.scala:1157)

scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:717)
org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:717)

org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)

org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
org.apache.spark.scheduler.Task.run(Task.scala:51)

org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744)

I think I'm missing something, since I can build up a scala collection of
the exact (user, movie) tuples I'm testing, map over that with the model
prediction, and it works fine.  But if I map over the RDD[Rating], it
doesn't.  Am I doing something obviously wrong?


Re: [MLLib]:choosing the Loss function

2014-08-07 Thread Burak Yavuz
The following code will allow you to run Logistic Regression using L-BFGS:

val lbfgs = new LBFGS(new LogisticGradient(), new SquaredL2Updater())
lbfgs.setMaxNumIterations(numIterations).setRegParam(regParam).setConvergenceTol(tol).setNumCorrections(numCor)

val weights = lbfgs.optimize(data, initialWeights)

The different loss function support you are asking for is the `new 
LogisticGradient()` part. The different regularizations support
is the `new SquaredL2Updater()`

The supported loss functions are:
1) Logistic - LogisticGradient
2) LeastSquares - LeastSquaresGradient
3) Hinge - HingeGradient

The regularizers are:
0) No regularization - SimpleUpdater
1) L1 regularization - L1Updater
2) L2 regularization - SquaredL2Updater

You can find more here: 
http://spark.apache.org/docs/latest/mllib-linear-methods.html#loss-functions

I would suggest using L-BFGS rather than SGD as it's both much faster and more 
accurate.

Burak

- Original Message -
From: SK skrishna...@gmail.com
To: u...@spark.incubator.apache.org
Sent: Thursday, August 7, 2014 6:31:14 PM
Subject: [MLLib]:choosing the Loss function

Hi,

According to the MLLib guide, there seems to be support for different loss
functions. But I could not find a command line parameter to choose the loss
function but only found regType to choose the regularization. Does MLLib
support a parameter to choose  the loss function?

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-choosing-the-Loss-function-tp11738.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[Phpmyadmin-git] [phpmyadmin/localized_docs] 96fe35: Translated using Weblate (Turkish)

2014-08-06 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/localized_docs
  Commit: 96fe3575d28d71967ff6d906c4cc1c720014427e
  
https://github.com/phpmyadmin/localized_docs/commit/96fe3575d28d71967ff6d906c4cc1c720014427e
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-08-06 (Wed, 06 Aug 2014)

  Changed paths:
M po/tr.mo
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (1622 of 1622)

[ci skip]


--
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071iu=/4140/ostg.clktrk___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


Re: Regularization parameters

2014-08-06 Thread Burak Yavuz
Hi,

That is interesting. Would you please share some code on how you are setting 
the regularization type, regularization parameters and running Logistic 
Regression?

Thanks,
Burak

- Original Message -
From: SK skrishna...@gmail.com
To: u...@spark.incubator.apache.org
Sent: Wednesday, August 6, 2014 6:18:43 PM
Subject: Regularization parameters

Hi,

I tried different regularization parameter values with Logistic Regression
for binary classification of my dataset and would like to understand the
following results:

regType = L2, regParam = 0.0 , I am getting AUC = 0.80 and accuracy of 80% 
regType = L1, regParam = 0.0 , I am getting AUC = 0.80 and accuracy of 50%

To calculate accuracy I am using 0.5 as threshold. prediction 0.5 is class
0, and prediction = 0.5 is class 1.

regParam = 0.0, implies I am not using any regularization, is that correct?
If so, it should not matter whether I specify L1 or L2, I should get the
same results. So why is the accuracy value different? 

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Regularization-parameters-tp11601.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[Phpmyadmin-git] [phpmyadmin/phpmyadmin] eab70e: Translated using Weblate (Turkish)

2014-08-05 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: eab70ea7b0e1e17034ffb90fe246cc836e76fd97
  
https://github.com/phpmyadmin/phpmyadmin/commit/eab70ea7b0e1e17034ffb90fe246cc836e76fd97
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-08-05 (Tue, 05 Aug 2014)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2974 of 2974)

[ci skip]


--
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071iu=/4140/ostg.clktrk___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


Re: Hello All

2014-08-05 Thread Burak Yavuz
Hi Guru,

Take a look at:
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

It has all the information you need on how to contribute to Spark. Also take a 
look at:
https://issues.apache.org/jira/browse/SPARK/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel

where a list of issues exist that need fixing. You can also request or propose 
new additions to Spark.

Happy coding!
Burak

- Original Message -
From: Gurumurthy Yeleswarapu guru...@yahoo.com.INVALID
To: dev@spark.apache.org
Sent: Tuesday, August 5, 2014 2:43:04 PM
Subject: Hello All

Im new to Spark community. Actively working on Hadoop eco system ( more 
specifically YARN). I'm very keen on getting my hands dirtily with Spark. 
Please let me know any pointers to start with.

Thanks in advance
Best regards
Guru Yeleswarapu


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[Phpmyadmin-git] [phpmyadmin/phpmyadmin] bc01c1: Translated using Weblate (Turkish)

2014-08-04 Thread Burak Yavuz
  Branch: refs/heads/master
  Home:   https://github.com/phpmyadmin/phpmyadmin
  Commit: bc01c12eefc26e03088e30f36fe84cd1e727379c
  
https://github.com/phpmyadmin/phpmyadmin/commit/bc01c12eefc26e03088e30f36fe84cd1e727379c
  Author: Burak Yavuz hitowerdi...@hotmail.com
  Date:   2014-08-04 (Mon, 04 Aug 2014)

  Changed paths:
M po/tr.po

  Log Message:
  ---
  Translated using Weblate (Turkish)

Currently translated at 100.0% (2962 of 2962)

[ci skip]


--
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071iu=/4140/ostg.clktrk___
Phpmyadmin-git mailing list
Phpmyadmin-git@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/phpmyadmin-git


<    4   5   6   7   8   9   10   11   >