[jira] [Created] (BEAM-2962) Instead of Images placeholders like [Sequential Graph Graphic] are present in documentation
Aseem Bansal created BEAM-2962: -- Summary: Instead of Images placeholders like [Sequential Graph Graphic] are present in documentation Key: BEAM-2962 URL: https://issues.apache.org/jira/browse/BEAM-2962 Project: Beam Issue Type: Bug Components: website Reporter: Aseem Bansal Assignee: Reuven Lax I was reading the documentation at https://beam.apache.org/documentation/programming-guide/ and saw this being present {noformat} The resulting workflow graph of the above pipeline looks like this: [Sequential Graph Graphic] {noformat} Looking at the above it seems that the text [Sequential Graph Graphic] is a placeholder which was supposed to be replaced by an image but was not. Similarly on this page there are other places where text is present inside [ .. ] and it seems that image was supposed to be there but is not. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (SPARK-21483) Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in Encoders.bean(Vector.class)
[ https://issues.apache.org/jira/browse/SPARK-21483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094607#comment-16094607 ] Aseem Bansal edited comment on SPARK-21483 at 7/20/17 12:29 PM: Some pseudo code to show what I am trying to achieve {code:java} class MyTransformer implemenets Serializable { public FeaturesAndLabel transform(RawData rawData) { //Some logic which creates Features and Labels from raw data. Raw data is just a java bean //FeaturesAndLabel is a bean which contains a SparseVector as features, and double as label } } {code} {code:java} Dataset dataset = //read from somewhere and create Dataset of RawData bean Dataset featuresAndLabels = dataset.transform(new MyTransformer()::transform) //use features and labels for machine learning {code} was (Author: anshbansal): Some pseudo code to show what I am trying to achieve {code:java} class MyTransformer implemenets Serializable { public FeaturesAndLabel transform(RawData rawData) { //Some logic which creates Features and Labels from raw data //FeaturesAndLabel is a bean which contains a SparseVector as features, and double as label } } {code} {code:java} Dataset dataset = //read from somewhere and create Dataset of RawData bean Dataset featuresAndLabels = dataset.transform(new MyTransformer()::transform) //use features and labels for machine learning {code} > Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in > Encoders.bean(Vector.class) > -- > > Key: SPARK-21483 > URL: https://issues.apache.org/jira/browse/SPARK-21483 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.1.0 >Reporter: Aseem Bansal >Priority: Minor > > The class org.apache.spark.ml.linalg.Vector is currently not bean-compliant > as per spark. > This makes it impossible to create a Vector via a dataset.tranform. It should > be made bean-compliant so it can be used. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21483) Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in Encoders.bean(Vector.class)
[ https://issues.apache.org/jira/browse/SPARK-21483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094607#comment-16094607 ] Aseem Bansal commented on SPARK-21483: -- Some pseudo code to show what I am trying to achieve {code:java} class MyTransformer implemenets Serializable { public FeaturesAndLabel transform(RawData rawData) { //Some logic which creates Features and Labels from raw data //FeaturesAndLabel is a bean which contains a SparseVector as features, and double as label } } {code} {code:java} Dataset dataset = //read from somewhere and create Dataset of RawData bean Dataset featuresAndLabels = dataset.transform(new MyTransformer()::transform) //use features and labels for machine learning {code} > Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in > Encoders.bean(Vector.class) > -- > > Key: SPARK-21483 > URL: https://issues.apache.org/jira/browse/SPARK-21483 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.1.0 >Reporter: Aseem Bansal >Priority: Minor > > The class org.apache.spark.ml.linalg.Vector is currently not bean-compliant > as per spark. > This makes it impossible to create a Vector via a dataset.tranform. It should > be made bean-compliant so it can be used. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21483) Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in Encoders.bean(Vector.class)
[ https://issues.apache.org/jira/browse/SPARK-21483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094314#comment-16094314 ] Aseem Bansal edited comment on SPARK-21483 at 7/20/17 9:11 AM: --- No it does not. Can you give a link to what you are referring to? And I am not using spark SQL. I am using Dataset's transformations only. was (Author: anshbansal): Now it does not. Can you give a link to what you are referring to? And I am not using spark SQL. I am using Dataset's transformations only. > Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in > Encoders.bean(Vector.class) > -- > > Key: SPARK-21483 > URL: https://issues.apache.org/jira/browse/SPARK-21483 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.1.0 >Reporter: Aseem Bansal >Priority: Minor > > The class org.apache.spark.ml.linalg.Vector is currently not bean-compliant > as per spark. > This makes it impossible to create a Vector via a dataset.tranform. It should > be made bean-compliant so it can be used. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21482) Make LabeledPoint bean-compliant so it can be used in Encoders.bean(LabeledPoint.class)
[ https://issues.apache.org/jira/browse/SPARK-21482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094315#comment-16094315 ] Aseem Bansal commented on SPARK-21482: -- There is a LabeledPoint in new ml api too https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.ml.feature.LabeledPoint I am able to workaround via using my own class. But I thought the ML package was supposed to be used with the dataset's API. That's why I am saying it should support this. > Make LabeledPoint bean-compliant so it can be used in > Encoders.bean(LabeledPoint.class) > --- > > Key: SPARK-21482 > URL: https://issues.apache.org/jira/browse/SPARK-21482 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.1.0 >Reporter: Aseem Bansal >Priority: Minor > > The LabeledPoint class is currently not bean-compliant as per spark > https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.ml.feature.LabeledPoint > This makes it impossible to create a LabeledPoint via a dataset.tranform. It > should be made bean-compliant so it can be used. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21483) Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in Encoders.bean(Vector.class)
[ https://issues.apache.org/jira/browse/SPARK-21483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094314#comment-16094314 ] Aseem Bansal commented on SPARK-21483: -- Now it does not. Can you give a link to what you are referring to? And I am not using spark SQL. I am using Dataset's transformations only. > Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in > Encoders.bean(Vector.class) > -- > > Key: SPARK-21483 > URL: https://issues.apache.org/jira/browse/SPARK-21483 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.1.0 >Reporter: Aseem Bansal >Priority: Minor > > The class org.apache.spark.ml.linalg.Vector is currently not bean-compliant > as per spark. > This makes it impossible to create a Vector via a dataset.tranform. It should > be made bean-compliant so it can be used. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21483) Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in Encoders.bean(Vector.class)
[ https://issues.apache.org/jira/browse/SPARK-21483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094297#comment-16094297 ] Aseem Bansal commented on SPARK-21483: -- How would you encode it otherwise? > Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in > Encoders.bean(Vector.class) > -- > > Key: SPARK-21483 > URL: https://issues.apache.org/jira/browse/SPARK-21483 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.1.0 >Reporter: Aseem Bansal >Priority: Minor > > The class org.apache.spark.ml.linalg.Vector is currently not bean-compliant > as per spark. > This makes it impossible to create a Vector via a dataset.tranform. It should > be made bean-compliant so it can be used. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21482) Make LabeledPoint bean-compliant so it can be used in Encoders.bean(LabeledPoint.class)
[ https://issues.apache.org/jira/browse/SPARK-21482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094295#comment-16094295 ] Aseem Bansal commented on SPARK-21482: -- I am using Java API. I tried a simple transformation with {noformat} dataset.transform(MyCustomToLabeledPointTransformer::transformer, Encoders.bean(LabeledPoint.class)) {noformat} and it threw bean-compliance exception. I am not sure whether the encoders should act on beans or not but clearly something is going on due to which they are acting on beans. > Make LabeledPoint bean-compliant so it can be used in > Encoders.bean(LabeledPoint.class) > --- > > Key: SPARK-21482 > URL: https://issues.apache.org/jira/browse/SPARK-21482 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.1.0 >Reporter: Aseem Bansal >Priority: Minor > > The LabeledPoint class is currently not bean-compliant as per spark > https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.ml.feature.LabeledPoint > This makes it impossible to create a LabeledPoint via a dataset.tranform. It > should be made bean-compliant so it can be used. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21483) Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in Encoders.bean(Vector.class)
Aseem Bansal created SPARK-21483: Summary: Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in Encoders.bean(Vector.class) Key: SPARK-21483 URL: https://issues.apache.org/jira/browse/SPARK-21483 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 2.1.0 Reporter: Aseem Bansal The class org.apache.spark.ml.linalg.Vector is currently not bean-compliant as per spark. This makes it impossible to create a Vector via a dataset.tranform. It should be made bean-compliant so it can be used. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21482) Make LabeledPoint bean-compliant so it can be used in Encoders.bean(LabeledPoint.class)
Aseem Bansal created SPARK-21482: Summary: Make LabeledPoint bean-compliant so it can be used in Encoders.bean(LabeledPoint.class) Key: SPARK-21482 URL: https://issues.apache.org/jira/browse/SPARK-21482 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 2.1.0 Reporter: Aseem Bansal The LabeledPoint class is currently not bean-compliant as per spark https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.ml.feature.LabeledPoint This makes it impossible to create a LabeledPoint via a dataset.tranform. It should be made bean-compliant so it can be used. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21481) Add indexOf method in ml.feature.HashingTF similar to mllib.feature.HashingTF
Aseem Bansal created SPARK-21481: Summary: Add indexOf method in ml.feature.HashingTF similar to mllib.feature.HashingTF Key: SPARK-21481 URL: https://issues.apache.org/jira/browse/SPARK-21481 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 2.2.0, 2.1.0 Reporter: Aseem Bansal If we want to find the index of any input based on hashing trick then it is possible in https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.mllib.feature.HashingTF but not in https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.ml.feature.HashingTF. Should allow that for feature parity -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21473) Running Transform on a bean which has only setters gives NullPointerExcpetion
[ https://issues.apache.org/jira/browse/SPARK-21473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated SPARK-21473: - Description: If I run the following using the Java API {code:java} dataset.map(Transformer::transform, Encoders.bean(BeanWithOnlySettersAndNoGetters.class)); {code} Then I get the below exception. I understand that it is not bean-compliant without the getters but the exception is wrong. Perhaps fixing the exception message would be a solution? {noformat} Caused by: java.lang.NullPointerException at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:126) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125) at org.apache.spark.sql.catalyst.JavaTypeInference$.inferDataType(JavaTypeInference.scala:55) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:89) at org.apache.spark.sql.Encoders$.bean(Encoders.scala:142) at org.apache.spark.sql.Encoders.bean(Encoders.scala) {noformat} was: If I run the following {code:java} dataset.map(Transformer::transform, Encoders.bean(BeanWithOnlySettersAndNoGetters.class)); {code} Then I get the below exception. I understand that it is not bean-compliant without the getters but the exception is wrong. Perhaps fixing the exception message would be a solution? {noformat} Caused by: java.lang.NullPointerException at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:126) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125) at org.apache.spark.sql.catalyst.JavaTypeInference$.inferDataType(JavaTypeInference.scala:55) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:89) at org.apache.spark.sql.Encoders$.bean(Encoders.scala:142) at org.apache.spark.sql.Encoders.bean(Encoders.scala) {noformat} > Running Transform on a bean which has only setters gives NullPointerExcpetion > - > > Key: SPARK-21473 > URL: https://issues.apache.org/jira/browse/SPARK-21473 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Aseem Bansal > > If I run the following using the Java API > {code:java} > dataset.map(Transformer::transform, > Encoders.bean(BeanWithOnlySettersAndNoGetters.class)); > {code} > Then I get the below exception. I understand that it is not bean-compliant > without the getters but the exception is wrong. Perhaps fixing the exception > message would be a solution? > {noformat} > Caused by: java.lang.NullPointerException > at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465) > at > org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:126) > at > org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(I
[jira] [Created] (SPARK-21473) Running Transform on a bean which has only setters gives NullPointerExcpetion
Aseem Bansal created SPARK-21473: Summary: Running Transform on a bean which has only setters gives NullPointerExcpetion Key: SPARK-21473 URL: https://issues.apache.org/jira/browse/SPARK-21473 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.1.0 Reporter: Aseem Bansal If I run the following {code:java} dataset.map(Transformer::transform, Encoders.bean(BeanWithOnlySettersAndNoGetters.class)); {code} Then I get the below exception. I understand that it is not bean-compliant without the getters but the exception is wrong. Perhaps fixing the exception message would be a solution? {noformat} Caused by: java.lang.NullPointerException at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:126) at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125) at org.apache.spark.sql.catalyst.JavaTypeInference$.inferDataType(JavaTypeInference.scala:55) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:89) at org.apache.spark.sql.Encoders$.bean(Encoders.scala:142) at org.apache.spark.sql.Encoders.bean(Encoders.scala) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17742) Spark Launcher does not get failed state in Listener
[ https://issues.apache.org/jira/browse/SPARK-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954714#comment-15954714 ] Aseem Bansal commented on SPARK-17742: -- [~daanvdn] We ended up using kafka messages to communicate to the web app that was using the launcher to launch the job whether the job was complete or failed. Dumped Launcher's states as they are broken. > Spark Launcher does not get failed state in Listener > - > > Key: SPARK-17742 > URL: https://issues.apache.org/jira/browse/SPARK-17742 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I tried to launch an application using the below code. This is dummy code to > reproduce the problem. I tried exiting spark with status -1, throwing an > exception etc. but in no case did the listener give me failed status. But if > a spark job returns -1 or throws an exception from the main method it should > be considered as a failure. > {code} > package com.example; > import org.apache.spark.launcher.SparkAppHandle; > import org.apache.spark.launcher.SparkLauncher; > import java.io.IOException; > public class Main2 { > public static void main(String[] args) throws IOException, > InterruptedException { > SparkLauncher launcher = new SparkLauncher() > .setSparkHome("/opt/spark2") > > .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar") > .setMainClass("com.example.Main") > .setMaster("local[2]"); > launcher.startApplication(new MyListener()); > Thread.sleep(1000 * 60); > } > } > class MyListener implements SparkAppHandle.Listener { > @Override > public void stateChanged(SparkAppHandle handle) { > System.out.println("state changed " + handle.getState()); > } > @Override > public void infoChanged(SparkAppHandle handle) { > System.out.println("info changed " + handle.getState()); > } > } > {code} > The spark job is > {code} > package com.example; > import org.apache.spark.sql.SparkSession; > import java.io.IOException; > public class Main { > public static void main(String[] args) throws IOException { > SparkSession sparkSession = SparkSession > .builder() > .appName("" + System.currentTimeMillis()) > .getOrCreate(); > try { > for (int i = 0; i < 15; i++) { > Thread.sleep(1000); > System.out.println("sleeping 1"); > } > } catch (InterruptedException e) { > e.printStackTrace(); > } > //sparkSession.stop(); > System.exit(-1); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10413) ML models should support prediction on single instances
[ https://issues.apache.org/jira/browse/SPARK-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857888#comment-15857888 ] Aseem Bansal commented on SPARK-10413: -- Something to look at would be https://github.com/combust/mleap which provides this on top of spark > ML models should support prediction on single instances > --- > > Key: SPARK-10413 > URL: https://issues.apache.org/jira/browse/SPARK-10413 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > Currently models in the pipeline API only implement transform(DataFrame). It > would be quite useful to support prediction on single instance. > UPDATE: This issue is for making predictions with single models. We can make > methods like {{def predict(features: Vector): Double}} public. > * This issue is *not* for single-instance prediction for full Pipelines, > which would require making predictions on {{Row}}s. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19449) Inconsistent results between ml package RandomForestClassificationModel and mllib package RandomForestModel
[ https://issues.apache.org/jira/browse/SPARK-19449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853629#comment-15853629 ] Aseem Bansal commented on SPARK-19449: -- [~sowen] My results are actually deterministic. No matter how many times I run it the number of true positives, true negatives, false positives, false negatives are always exactly the same. The problem is that they are always inconsistent too by exactly the same amount in the 2 implementations. > Inconsistent results between ml package RandomForestClassificationModel and > mllib package RandomForestModel > --- > > Key: SPARK-19449 > URL: https://issues.apache.org/jira/browse/SPARK-19449 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 2.1.0 >Reporter: Aseem Bansal > > I worked on some code to convert ml package RandomForestClassificationModel > to mllib package RandomForestModel. It was needed because we need to make > predictions on the order of ms. I found that the results are inconsistent > although the underlying DecisionTreeModel are exactly the same. So the > behavior between the 2 implementations is inconsistent which should not be > the case. > The below code can be used to reproduce the issue. Can run this as a simple > Java app as long as you have spark dependencies set up properly. > {noformat} > import org.apache.spark.ml.Transformer; > import org.apache.spark.ml.classification.*; > import org.apache.spark.ml.linalg.*; > import org.apache.spark.ml.regression.RandomForestRegressionModel; > import org.apache.spark.mllib.linalg.DenseVector; > import org.apache.spark.mllib.linalg.Vector; > import org.apache.spark.mllib.tree.configuration.Algo; > import org.apache.spark.mllib.tree.model.DecisionTreeModel; > import org.apache.spark.mllib.tree.model.RandomForestModel; > import org.apache.spark.sql.Dataset; > import org.apache.spark.sql.Row; > import org.apache.spark.sql.RowFactory; > import org.apache.spark.sql.SparkSession; > import org.apache.spark.sql.types.DataTypes; > import org.apache.spark.sql.types.Metadata; > import org.apache.spark.sql.types.StructField; > import org.apache.spark.sql.types.StructType; > import scala.Enumeration; > import java.util.ArrayList; > import java.util.List; > import java.util.Random; > abstract class Predictor { > abstract double predict(Vector vector); > } > public class MainConvertModels { > public static final int seed = 42; > public static void main(String[] args) { > int numRows = 1000; > int numFeatures = 3; > int numClasses = 2; > double trainFraction = 0.8; > double testFraction = 0.2; > SparkSession spark = SparkSession.builder() > .appName("conversion app") > .master("local") > .getOrCreate(); > Dataset data = getDummyData(spark, numRows, numFeatures, > numClasses); > Dataset[] splits = data.randomSplit(new double[]{trainFraction, > testFraction}, seed); > Dataset trainingData = splits[0]; > Dataset testData = splits[1]; > testData.cache(); > List labels = getLabels(testData); > List features = getFeatures(testData); > DecisionTreeClassifier classifier1 = new DecisionTreeClassifier(); > DecisionTreeClassificationModel model1 = > classifier1.fit(trainingData); > final DecisionTreeModel convertedModel1 = > convertDecisionTreeModel(model1, Algo.Classification()); > RandomForestClassifier classifier = new RandomForestClassifier(); > RandomForestClassificationModel model2 = classifier.fit(trainingData); > final RandomForestModel convertedModel2 = > convertRandomForestModel(model2); > System.out.println( > "** DecisionTreeClassifier\n" + > "** Original **" + getInfo(model1, testData) + "\n" + > "** New **" + getInfo(new Predictor() { > double predict(Vector vector) {return > convertedModel1.predict(vector);} > }, labels, features) + "\n" + > "\n" + > "** RandomForestClassifier\n" + > "** Original **" + getInfo(model2, testData) + "\n" + > "** New **" + getInfo(new Predictor() {double > predict(Vector vector) {return convertedModel2.predict(vector);}}, labels, > features) + "\n" + > "\n" + > ""); > } > static Dataset getDummyData(SparkSession spark, int numberRows, int > numberFeatures, int labelUpperBound) { > StructType schema = new StructType(new StructField[]{ > new StructField("label",
[jira] [Commented] (SPARK-19444) Tokenizer example does not compile without extra imports
[ https://issues.apache.org/jira/browse/SPARK-19444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851588#comment-15851588 ] Aseem Bansal commented on SPARK-19444: -- https://github.com/apache/spark/pull/16789 > Tokenizer example does not compile without extra imports > > > Key: SPARK-19444 > URL: https://issues.apache.org/jira/browse/SPARK-19444 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.1.0 >Reporter: Aseem Bansal >Priority: Minor > > The example at http://spark.apache.org/docs/2.1.0/ml-features.html#tokenizer > does not compile without the following static import > import static org.apache.spark.sql.functions.*; -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19449) Inconsistent results between ml package RandomForestClassificationModel and mllib package RandomForestModel
[ https://issues.apache.org/jira/browse/SPARK-19449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851576#comment-15851576 ] Aseem Bansal commented on SPARK-19449: -- Isn't the decision tree debug string print it as a series of IF-ELSE? I printed the debug string for the 2 random forest models and it was exactly the same. In other words the 2 implementations should be mathematically equivalent. The random processes for selecting data should not cause any issues as I ensured that the exact same data is going to both versions. It works for decision trees and random forest classifier is just majority vote of bunch of decision trees classifiers so I cannot see how that could be different. > Inconsistent results between ml package RandomForestClassificationModel and > mllib package RandomForestModel > --- > > Key: SPARK-19449 > URL: https://issues.apache.org/jira/browse/SPARK-19449 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 2.1.0 >Reporter: Aseem Bansal > > I worked on some code to convert ml package RandomForestClassificationModel > to mllib package RandomForestModel. It was needed because we need to make > predictions on the order of ms. I found that the results are inconsistent > although the underlying DecisionTreeModel are exactly the same. So the > behavior between the 2 implementations is inconsistent which should not be > the case. > The below code can be used to reproduce the issue. Can run this as a simple > Java app as long as you have spark dependencies set up properly. > {noformat} > import org.apache.spark.ml.Transformer; > import org.apache.spark.ml.classification.*; > import org.apache.spark.ml.linalg.*; > import org.apache.spark.ml.regression.RandomForestRegressionModel; > import org.apache.spark.mllib.linalg.DenseVector; > import org.apache.spark.mllib.linalg.Vector; > import org.apache.spark.mllib.tree.configuration.Algo; > import org.apache.spark.mllib.tree.model.DecisionTreeModel; > import org.apache.spark.mllib.tree.model.RandomForestModel; > import org.apache.spark.sql.Dataset; > import org.apache.spark.sql.Row; > import org.apache.spark.sql.RowFactory; > import org.apache.spark.sql.SparkSession; > import org.apache.spark.sql.types.DataTypes; > import org.apache.spark.sql.types.Metadata; > import org.apache.spark.sql.types.StructField; > import org.apache.spark.sql.types.StructType; > import scala.Enumeration; > import java.util.ArrayList; > import java.util.List; > import java.util.Random; > abstract class Predictor { > abstract double predict(Vector vector); > } > public class MainConvertModels { > public static final int seed = 42; > public static void main(String[] args) { > int numRows = 1000; > int numFeatures = 3; > int numClasses = 2; > double trainFraction = 0.8; > double testFraction = 0.2; > SparkSession spark = SparkSession.builder() > .appName("conversion app") > .master("local") > .getOrCreate(); > Dataset data = getDummyData(spark, numRows, numFeatures, > numClasses); > Dataset[] splits = data.randomSplit(new double[]{trainFraction, > testFraction}, seed); > Dataset trainingData = splits[0]; > Dataset testData = splits[1]; > testData.cache(); > List labels = getLabels(testData); > List features = getFeatures(testData); > DecisionTreeClassifier classifier1 = new DecisionTreeClassifier(); > DecisionTreeClassificationModel model1 = > classifier1.fit(trainingData); > final DecisionTreeModel convertedModel1 = > convertDecisionTreeModel(model1, Algo.Classification()); > RandomForestClassifier classifier = new RandomForestClassifier(); > RandomForestClassificationModel model2 = classifier.fit(trainingData); > final RandomForestModel convertedModel2 = > convertRandomForestModel(model2); > System.out.println( > "** DecisionTreeClassifier\n" + > "** Original **" + getInfo(model1, testData) + "\n" + > "** New **" + getInfo(new Predictor() { > double predict(Vector vector) {return > convertedModel1.predict(vector);} > }, labels, features) + "\n" + > "\n" + > "** RandomForestClassifier\n" + > "** Original **" + getInfo(model2, testData) + "\n" + > "** New **" + getInfo(new Predictor() {double > predict(Vector vector) {return convertedModel2.predict(vector);}}, labels, > features) + "\n" + > "\n" + >
[jira] [Commented] (SPARK-19449) Inconsistent results between ml package RandomForestClassificationModel and mllib package RandomForestModel
[ https://issues.apache.org/jira/browse/SPARK-19449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851568#comment-15851568 ] Aseem Bansal commented on SPARK-19449: -- [~srowen] I removed some extra code. The part where I did the conversion is at the end in convertRandomForestModel method. Basically the above code does this - Prepare 1000 rows of data with 3 features randomly. Prepare 1000 labels randomly. I am not working on creating the model but the conversion. So having random data is not an issue. It will just be a horrible model. - Split the data in 80/20 ratio for training/test - train ml version of decision tree model and random forest model using the training set. Let's call them DT1 and RF1 - convert these to mllib version of the models. Let's call them DT2 and RF2 - Use the test set to predict labels using DT1, DT2, RF1, RF2. - Compare predicted labels DT1 with DT2. Same results - Compare predicted labels RF1 with RF2. Different results. There should not be any random results here as I have used seeds for random number generators everywhere and then used the exact same data for doing predictions using all 4 models. > Inconsistent results between ml package RandomForestClassificationModel and > mllib package RandomForestModel > --- > > Key: SPARK-19449 > URL: https://issues.apache.org/jira/browse/SPARK-19449 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 2.1.0 >Reporter: Aseem Bansal > > I worked on some code to convert ml package RandomForestClassificationModel > to mllib package RandomForestModel. It was needed because we need to make > predictions on the order of ms. I found that the results are inconsistent > although the underlying DecisionTreeModel are exactly the same. So the > behavior between the 2 implementations is inconsistent which should not be > the case. > The below code can be used to reproduce the issue. Can run this as a simple > Java app as long as you have spark dependencies set up properly. > {noformat} > import org.apache.spark.ml.Transformer; > import org.apache.spark.ml.classification.*; > import org.apache.spark.ml.linalg.*; > import org.apache.spark.ml.regression.RandomForestRegressionModel; > import org.apache.spark.mllib.linalg.DenseVector; > import org.apache.spark.mllib.linalg.Vector; > import org.apache.spark.mllib.tree.configuration.Algo; > import org.apache.spark.mllib.tree.model.DecisionTreeModel; > import org.apache.spark.mllib.tree.model.RandomForestModel; > import org.apache.spark.sql.Dataset; > import org.apache.spark.sql.Row; > import org.apache.spark.sql.RowFactory; > import org.apache.spark.sql.SparkSession; > import org.apache.spark.sql.types.DataTypes; > import org.apache.spark.sql.types.Metadata; > import org.apache.spark.sql.types.StructField; > import org.apache.spark.sql.types.StructType; > import scala.Enumeration; > import java.util.ArrayList; > import java.util.List; > import java.util.Random; > abstract class Predictor { > abstract double predict(Vector vector); > } > public class MainConvertModels { > public static final int seed = 42; > public static void main(String[] args) { > int numRows = 1000; > int numFeatures = 3; > int numClasses = 2; > double trainFraction = 0.8; > double testFraction = 0.2; > SparkSession spark = SparkSession.builder() > .appName("conversion app") > .master("local") > .getOrCreate(); > Dataset data = getDummyData(spark, numRows, numFeatures, > numClasses); > Dataset[] splits = data.randomSplit(new double[]{trainFraction, > testFraction}, seed); > Dataset trainingData = splits[0]; > Dataset testData = splits[1]; > testData.cache(); > List labels = getLabels(testData); > List features = getFeatures(testData); > DecisionTreeClassifier classifier1 = new DecisionTreeClassifier(); > DecisionTreeClassificationModel model1 = > classifier1.fit(trainingData); > final DecisionTreeModel convertedModel1 = > convertDecisionTreeModel(model1, Algo.Classification()); > RandomForestClassifier classifier = new RandomForestClassifier(); > RandomForestClassificationModel model2 = classifier.fit(trainingData); > final RandomForestModel convertedModel2 = > convertRandomForestModel(model2); > System.out.println( > "** DecisionTreeClassifier\n" + > "** Original **" + getInfo(model1, testData) + "\n" + > "** New **" + getInfo(new Predictor() { > double predict(Vector vector) {return > convertedModel1.predict(vector);}
[jira] [Updated] (SPARK-19449) Inconsistent results between ml package RandomForestClassificationModel and mllib package RandomForestModel
[ https://issues.apache.org/jira/browse/SPARK-19449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated SPARK-19449: - Description: I worked on some code to convert ml package RandomForestClassificationModel to mllib package RandomForestModel. It was needed because we need to make predictions on the order of ms. I found that the results are inconsistent although the underlying DecisionTreeModel are exactly the same. So the behavior between the 2 implementations is inconsistent which should not be the case. The below code can be used to reproduce the issue. Can run this as a simple Java app as long as you have spark dependencies set up properly. {noformat} import org.apache.spark.ml.Transformer; import org.apache.spark.ml.classification.*; import org.apache.spark.ml.linalg.*; import org.apache.spark.ml.regression.RandomForestRegressionModel; import org.apache.spark.mllib.linalg.DenseVector; import org.apache.spark.mllib.linalg.Vector; import org.apache.spark.mllib.tree.configuration.Algo; import org.apache.spark.mllib.tree.model.DecisionTreeModel; import org.apache.spark.mllib.tree.model.RandomForestModel; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.RowFactory; import org.apache.spark.sql.SparkSession; import org.apache.spark.sql.types.DataTypes; import org.apache.spark.sql.types.Metadata; import org.apache.spark.sql.types.StructField; import org.apache.spark.sql.types.StructType; import scala.Enumeration; import java.util.ArrayList; import java.util.List; import java.util.Random; abstract class Predictor { abstract double predict(Vector vector); } public class MainConvertModels { public static final int seed = 42; public static void main(String[] args) { int numRows = 1000; int numFeatures = 3; int numClasses = 2; double trainFraction = 0.8; double testFraction = 0.2; SparkSession spark = SparkSession.builder() .appName("conversion app") .master("local") .getOrCreate(); Dataset data = getDummyData(spark, numRows, numFeatures, numClasses); Dataset[] splits = data.randomSplit(new double[]{trainFraction, testFraction}, seed); Dataset trainingData = splits[0]; Dataset testData = splits[1]; testData.cache(); List labels = getLabels(testData); List features = getFeatures(testData); DecisionTreeClassifier classifier1 = new DecisionTreeClassifier(); DecisionTreeClassificationModel model1 = classifier1.fit(trainingData); final DecisionTreeModel convertedModel1 = convertDecisionTreeModel(model1, Algo.Classification()); RandomForestClassifier classifier = new RandomForestClassifier(); RandomForestClassificationModel model2 = classifier.fit(trainingData); final RandomForestModel convertedModel2 = convertRandomForestModel(model2); System.out.println( "** DecisionTreeClassifier\n" + "** Original **" + getInfo(model1, testData) + "\n" + "** New **" + getInfo(new Predictor() { double predict(Vector vector) {return convertedModel1.predict(vector);} }, labels, features) + "\n" + "\n" + "** RandomForestClassifier\n" + "** Original **" + getInfo(model2, testData) + "\n" + "** New **" + getInfo(new Predictor() {double predict(Vector vector) {return convertedModel2.predict(vector);}}, labels, features) + "\n" + "\n" + ""); } static Dataset getDummyData(SparkSession spark, int numberRows, int numberFeatures, int labelUpperBound) { StructType schema = new StructType(new StructField[]{ new StructField("label", DataTypes.DoubleType, false, Metadata.empty()), new StructField("features", new VectorUDT(), false, Metadata.empty()) }); double[][] vectors = prepareData(numberRows, numberFeatures); Random random = new Random(seed); List dataTest = new ArrayList<>(); for (double[] vector : vectors) { double label = (double) random.nextInt(2); dataTest.add(RowFactory.create(label, Vectors.dense(vector))); } return spark.createDataFrame(dataTest, schema); } static double[][] prepareData(int numRows, int numFeatures) { Random random = new Random(seed); double[][] result = new double[numRows][numFeatures]; for (int row = 0; row < numRows; row++) { for (int feature = 0; feature < numFeatures; feature++) { result[row][feature] = random.nextDouble(); } } return result; } static S
[jira] [Updated] (SPARK-19449) Inconsistent results between ml package RandomForestClassificationModel and mllib package RandomForestModel
[ https://issues.apache.org/jira/browse/SPARK-19449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated SPARK-19449: - Description: I worked on some code to convert ml package RandomForestClassificationModel to mllib package RandomForestModel. It was needed because we need to make predictions on the order of ms. I found that the results are inconsistent although the underlying DecisionTreeModel are exactly the same. The below code can be used to reproduce the issue. Can run this as a simple Java app as long as you have spark dependencies set up properly. {noformat} import org.apache.spark.ml.Transformer; import org.apache.spark.ml.classification.*; import org.apache.spark.ml.linalg.*; import org.apache.spark.ml.regression.RandomForestRegressionModel; import org.apache.spark.mllib.linalg.DenseVector; import org.apache.spark.mllib.linalg.Vector; import org.apache.spark.mllib.tree.configuration.Algo; import org.apache.spark.mllib.tree.model.DecisionTreeModel; import org.apache.spark.mllib.tree.model.RandomForestModel; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.RowFactory; import org.apache.spark.sql.SparkSession; import org.apache.spark.sql.types.DataTypes; import org.apache.spark.sql.types.Metadata; import org.apache.spark.sql.types.StructField; import org.apache.spark.sql.types.StructType; import scala.Enumeration; import java.util.ArrayList; import java.util.List; import java.util.Random; abstract class Predictor { abstract double predict(Vector vector); } public class MainConvertModels { public static final int seed = 42; public static void main(String[] args) { int numRows = 1000; int numFeatures = 3; int numClasses = 2; double trainFraction = 0.8; double testFraction = 0.2; SparkSession spark = SparkSession.builder() .appName("conversion app") .master("local") .getOrCreate(); //Dataset data = getData(spark, "libsvm", "/opt/spark2/data/mllib/sample_libsvm_data.txt"); Dataset data = getDummyData(spark, numRows, numFeatures, numClasses); Dataset[] splits = data.randomSplit(new double[]{trainFraction, testFraction}, seed); Dataset trainingData = splits[0]; Dataset testData = splits[1]; testData.cache(); List labels = getLabels(testData); List features = getFeatures(testData); DecisionTreeClassifier classifier1 = new DecisionTreeClassifier(); DecisionTreeClassificationModel model1 = classifier1.fit(trainingData); final DecisionTreeModel convertedModel1 = convertDecisionTreeModel(model1, Algo.Classification()); RandomForestClassifier classifier = new RandomForestClassifier(); RandomForestClassificationModel model2 = classifier.fit(trainingData); final RandomForestModel convertedModel2 = convertRandomForestModel(model2); LogisticRegression lr = new LogisticRegression(); LogisticRegressionModel model3 = lr.fit(trainingData); final org.apache.spark.mllib.classification.LogisticRegressionModel convertedModel3 = convertLogisticRegressionModel(model3); System.out.println( "** DecisionTreeClassifier\n" + "** Original **" + getInfo(model1, testData) + "\n" + "** New **" + getInfo(new Predictor() { double predict(Vector vector) {return convertedModel1.predict(vector);} }, labels, features) + "\n" + "\n" + "** RandomForestClassifier\n" + "** Original **" + getInfo(model2, testData) + "\n" + "** New **" + getInfo(new Predictor() {double predict(Vector vector) {return convertedModel2.predict(vector);}}, labels, features) + "\n" + "\n" + "** LogisticRegression\n" + "** Original **" + getInfo(model3, testData) + "\n" + "** New **" + getInfo(new Predictor() {double predict(Vector vector) { return convertedModel3.predict(vector);}}, labels, features) + "\n" + ""); } static Dataset getData(SparkSession spark, String format, String location) { return spark.read() .format(format) .load(location); } static Dataset getDummyData(SparkSession spark, int numberRows, int numberFeatures, int labelUpperBound) { StructType schema = new StructType(new StructField[]{ new StructField("label", DataTypes.DoubleType, false, Metadata.empty()), new StructField("features", new VectorUDT(), false, Metadata.empty()) }); double[][] vectors = prepareData(numberRows, numberFeatures);
[jira] [Updated] (SPARK-19449) Inconsistent results between ml package RandomForestClassificationModel and mllib package RandomForestModel
[ https://issues.apache.org/jira/browse/SPARK-19449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated SPARK-19449: - Description: I worked on some code to convert ml package RandomForestClassificationModel to mllib package RandomForestModel. It was needed because we need to make predictions on the order of ms. I found that the results are inconsistent although the underlying DecisionTreeModel are exactly the same. So the behavior between the 2 implementations is inconsistent which should not be the case. The below code can be used to reproduce the issue. Can run this as a simple Java app as long as you have spark dependencies set up properly. {noformat} import org.apache.spark.ml.Transformer; import org.apache.spark.ml.classification.*; import org.apache.spark.ml.linalg.*; import org.apache.spark.ml.regression.RandomForestRegressionModel; import org.apache.spark.mllib.linalg.DenseVector; import org.apache.spark.mllib.linalg.Vector; import org.apache.spark.mllib.tree.configuration.Algo; import org.apache.spark.mllib.tree.model.DecisionTreeModel; import org.apache.spark.mllib.tree.model.RandomForestModel; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.RowFactory; import org.apache.spark.sql.SparkSession; import org.apache.spark.sql.types.DataTypes; import org.apache.spark.sql.types.Metadata; import org.apache.spark.sql.types.StructField; import org.apache.spark.sql.types.StructType; import scala.Enumeration; import java.util.ArrayList; import java.util.List; import java.util.Random; abstract class Predictor { abstract double predict(Vector vector); } public class MainConvertModels { public static final int seed = 42; public static void main(String[] args) { int numRows = 1000; int numFeatures = 3; int numClasses = 2; double trainFraction = 0.8; double testFraction = 0.2; SparkSession spark = SparkSession.builder() .appName("conversion app") .master("local") .getOrCreate(); //Dataset data = getData(spark, "libsvm", "/opt/spark2/data/mllib/sample_libsvm_data.txt"); Dataset data = getDummyData(spark, numRows, numFeatures, numClasses); Dataset[] splits = data.randomSplit(new double[]{trainFraction, testFraction}, seed); Dataset trainingData = splits[0]; Dataset testData = splits[1]; testData.cache(); List labels = getLabels(testData); List features = getFeatures(testData); DecisionTreeClassifier classifier1 = new DecisionTreeClassifier(); DecisionTreeClassificationModel model1 = classifier1.fit(trainingData); final DecisionTreeModel convertedModel1 = convertDecisionTreeModel(model1, Algo.Classification()); RandomForestClassifier classifier = new RandomForestClassifier(); RandomForestClassificationModel model2 = classifier.fit(trainingData); final RandomForestModel convertedModel2 = convertRandomForestModel(model2); LogisticRegression lr = new LogisticRegression(); LogisticRegressionModel model3 = lr.fit(trainingData); final org.apache.spark.mllib.classification.LogisticRegressionModel convertedModel3 = convertLogisticRegressionModel(model3); System.out.println( "** DecisionTreeClassifier\n" + "** Original **" + getInfo(model1, testData) + "\n" + "** New **" + getInfo(new Predictor() { double predict(Vector vector) {return convertedModel1.predict(vector);} }, labels, features) + "\n" + "\n" + "** RandomForestClassifier\n" + "** Original **" + getInfo(model2, testData) + "\n" + "** New **" + getInfo(new Predictor() {double predict(Vector vector) {return convertedModel2.predict(vector);}}, labels, features) + "\n" + "\n" + "** LogisticRegression\n" + "** Original **" + getInfo(model3, testData) + "\n" + "** New **" + getInfo(new Predictor() {double predict(Vector vector) { return convertedModel3.predict(vector);}}, labels, features) + "\n" + ""); } static Dataset getData(SparkSession spark, String format, String location) { return spark.read() .format(format) .load(location); } static Dataset getDummyData(SparkSession spark, int numberRows, int numberFeatures, int labelUpperBound) { StructType schema = new StructType(new StructField[]{ new StructField("label", DataTypes.DoubleType, false, Metadata.empty()), new StructField("features", new VectorUDT(), false, Metadata.em
[jira] [Created] (SPARK-19449) Inconsistent results between ml package RandomForestClassificationModel and mllib package RandomForestModel
Aseem Bansal created SPARK-19449: Summary: Inconsistent results between ml package RandomForestClassificationModel and mllib package RandomForestModel Key: SPARK-19449 URL: https://issues.apache.org/jira/browse/SPARK-19449 Project: Spark Issue Type: Bug Components: ML, MLlib Affects Versions: 2.1.0 Reporter: Aseem Bansal I worked on some code to convert ml package RandomForestClassificationModel to mllib package RandomForestModel. It was needed because we need to make predictions on the order of ms. I found that the results are inconsistent although the underlying DecisionTreeModel are exactly the same. The below code can be used to reproduce the issue. {noformat} import org.apache.spark.ml.Transformer; import org.apache.spark.ml.classification.*; import org.apache.spark.ml.linalg.*; import org.apache.spark.ml.regression.RandomForestRegressionModel; import org.apache.spark.mllib.linalg.DenseVector; import org.apache.spark.mllib.linalg.Vector; import org.apache.spark.mllib.tree.configuration.Algo; import org.apache.spark.mllib.tree.model.DecisionTreeModel; import org.apache.spark.mllib.tree.model.RandomForestModel; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.RowFactory; import org.apache.spark.sql.SparkSession; import org.apache.spark.sql.types.DataTypes; import org.apache.spark.sql.types.Metadata; import org.apache.spark.sql.types.StructField; import org.apache.spark.sql.types.StructType; import scala.Enumeration; import java.util.ArrayList; import java.util.List; import java.util.Random; abstract class Predictor { abstract double predict(Vector vector); } public class MainConvertModels { public static final int seed = 42; public static void main(String[] args) { int numRows = 1000; int numFeatures = 3; int numClasses = 2; double trainFraction = 0.8; double testFraction = 0.2; SparkSession spark = SparkSession.builder() .appName("conversion app") .master("local") .getOrCreate(); //Dataset data = getData(spark, "libsvm", "/opt/spark2/data/mllib/sample_libsvm_data.txt"); Dataset data = getDummyData(spark, numRows, numFeatures, numClasses); Dataset[] splits = data.randomSplit(new double[]{trainFraction, testFraction}, seed); Dataset trainingData = splits[0]; Dataset testData = splits[1]; testData.cache(); List labels = getLabels(testData); List features = getFeatures(testData); DecisionTreeClassifier classifier1 = new DecisionTreeClassifier(); DecisionTreeClassificationModel model1 = classifier1.fit(trainingData); final DecisionTreeModel convertedModel1 = convertDecisionTreeModel(model1, Algo.Classification()); RandomForestClassifier classifier = new RandomForestClassifier(); RandomForestClassificationModel model2 = classifier.fit(trainingData); final RandomForestModel convertedModel2 = convertRandomForestModel(model2); LogisticRegression lr = new LogisticRegression(); LogisticRegressionModel model3 = lr.fit(trainingData); final org.apache.spark.mllib.classification.LogisticRegressionModel convertedModel3 = convertLogisticRegressionModel(model3); System.out.println( "** DecisionTreeClassifier\n" + "** Original **" + getInfo(model1, testData) + "\n" + "** New **" + getInfo(new Predictor() { double predict(Vector vector) {return convertedModel1.predict(vector);} }, labels, features) + "\n" + "\n" + "** RandomForestClassifier\n" + "** Original **" + getInfo(model2, testData) + "\n" + "** New **" + getInfo(new Predictor() {double predict(Vector vector) {return convertedModel2.predict(vector);}}, labels, features) + "\n" + "\n" + "** LogisticRegression\n" + "** Original **" + getInfo(model3, testData) + "\n" + "** New **" + getInfo(new Predictor() {double predict(Vector vector) { return convertedModel3.predict(vector);}}, labels, features) + "\n" + ""); } static Dataset getData(SparkSession spark, String format, String location) { return spark.read() .format(format) .load(location); } static Dataset getDummyData(SparkSession spark, int numberRows, int numberFeatures, int labelUpperBound) { StructType schema = new StructType(new StructField[]{ new StructField("label", DataTypes.DoubleType, false, Metadata.empty()), n
[jira] [Created] (SPARK-19445) Please remove tylerchap...@yahoo-inc.com subscription from u...@spark.apache.org
Aseem Bansal created SPARK-19445: Summary: Please remove tylerchap...@yahoo-inc.com subscription from u...@spark.apache.org Key: SPARK-19445 URL: https://issues.apache.org/jira/browse/SPARK-19445 Project: Spark Issue Type: IT Help Components: Project Infra Affects Versions: 2.1.0 Reporter: Aseem Bansal Whenever a mail is sent to u...@spark.apache.org I receive this email {noformat} This is an automatically generated message. tylerchap...@yahoo-inc.com is no longer with Yahoo! Inc. Your message will not be forwarded. If you have a sales inquiry, please email yahoosa...@yahoo-inc.com and someone will follow up with you shortly. If you require assistance with a legal matter, please send a message to legal-noti...@yahoo-inc.com Thank you! {noformat} It is clear that this user is no longer available. Please remove this email address from mailing list so we don't get so much spam. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19444) Tokenizer example does not compile without extra imports
[ https://issues.apache.org/jira/browse/SPARK-19444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851382#comment-15851382 ] Aseem Bansal commented on SPARK-19444: -- [~srowen] I can find the location at https://github.com/apache/spark/blob/master/docs/ml-features.md which led me to https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java#L40 but what is $example on:untyped_ops$ The imports are there. But seems this is broken. Then this is probably a parsing issue? > Tokenizer example does not compile without extra imports > > > Key: SPARK-19444 > URL: https://issues.apache.org/jira/browse/SPARK-19444 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.1.0 >Reporter: Aseem Bansal >Priority: Minor > > The example at http://spark.apache.org/docs/2.1.0/ml-features.html#tokenizer > does not compile without the following static import > import static org.apache.spark.sql.functions.*; -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19444) Tokenizer example does not compile without extra imports
[ https://issues.apache.org/jira/browse/SPARK-19444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated SPARK-19444: - Priority: Minor (was: Major) > Tokenizer example does not compile without extra imports > > > Key: SPARK-19444 > URL: https://issues.apache.org/jira/browse/SPARK-19444 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.1.0 >Reporter: Aseem Bansal >Priority: Minor > > The example at http://spark.apache.org/docs/2.1.0/ml-features.html#tokenizer > does not compile without the following static import > import static org.apache.spark.sql.functions.*; -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19444) Tokenizer example does not compile without extra imports
Aseem Bansal created SPARK-19444: Summary: Tokenizer example does not compile without extra imports Key: SPARK-19444 URL: https://issues.apache.org/jira/browse/SPARK-19444 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 2.1.0 Reporter: Aseem Bansal The example at http://spark.apache.org/docs/2.1.0/ml-features.html#tokenizer does not compile without the following static import import static org.apache.spark.sql.functions.*; -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-19410) Links to API documentation are broken
Title: Message Title Aseem Bansal created an issue  Spark / SPARK-19410 Links to API documentation are broken Issue Type: Documentation Affects Versions: 2.1.0 Assignee: Unassigned Components: Documentation Created: 31/Jan/17 08:55 Priority: Major Reporter: Aseem Bansal I was looking at https://spark.apache.org/docs/latest/ml-pipeline.html#example-estimator-transformer-and-param and saw that the links to API documentation are broken Add Comment
[jira] [Created] (ZOOKEEPER-2657) Using zookeeper without SASL causes error logging
Aseem Bansal created ZOOKEEPER-2657: --- Summary: Using zookeeper without SASL causes error logging Key: ZOOKEEPER-2657 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2657 Project: ZooKeeper Issue Type: Improvement Affects Versions: 3.4.6 Reporter: Aseem Bansal We are using Kafka which uses zookeeper. But we are not using SASL. So we keep on getting {noformat} CRITICAL: Found 32 lines (limit=1/1): (1) 2016-12-16 07:02:14.780 [INFO ] [r] org.apache.zookeeper.ClientCnxn [] - Opening socket connection to server 10.0.1.47/10.0.1.47:2181. Will not attempt to authenticate using SASL (unknown error) {noformat} Found http://stackoverflow.com/a/26532778/2235567 Looked and found this based on the above https://svn.apache.org/repos/asf/zookeeper/trunk/src/java/main/org/apache/zookeeper/client/ZooKeeperSaslClient.java Searched for "Will not attempt to authenticate using SASL" and found the "unknown error". Can the message be changed so that the word error is not there as it is not really an error? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (SPARK-10413) Model should support prediction on single instance
[ https://issues.apache.org/jira/browse/SPARK-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734495#comment-15734495 ] Aseem Bansal edited comment on SPARK-10413 at 12/9/16 6:39 AM: --- Hi Is anyone working on this? And is there a JIRA ticket for having a predict method on PipelineModel? was (Author: anshbansal): Hi Is anyone working on this? > Model should support prediction on single instance > -- > > Key: SPARK-10413 > URL: https://issues.apache.org/jira/browse/SPARK-10413 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > Currently models in the pipeline API only implement transform(DataFrame). It > would be quite useful to support prediction on single instance. > UPDATE: This issue is for making predictions with single models. We can make > methods like {{def predict(features: Vector): Double}} public. > * This issue is *not* for single-instance prediction for full Pipelines, > which would require making predictions on {{Row}}s. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10413) Model should support prediction on single instance
[ https://issues.apache.org/jira/browse/SPARK-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734495#comment-15734495 ] Aseem Bansal commented on SPARK-10413: -- Hi Is anyone working on this? > Model should support prediction on single instance > -- > > Key: SPARK-10413 > URL: https://issues.apache.org/jira/browse/SPARK-10413 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > Currently models in the pipeline API only implement transform(DataFrame). It > would be quite useful to support prediction on single instance. > UPDATE: This issue is for making predictions with single models. We can make > methods like {{def predict(features: Vector): Double}} public. > * This issue is *not* for single-instance prediction for full Pipelines, > which would require making predictions on {{Row}}s. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18241) If Spark Launcher fails to startApplication then handle's state does not change
[ https://issues.apache.org/jira/browse/SPARK-18241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15631898#comment-15631898 ] Aseem Bansal commented on SPARK-18241: -- Looking at the source code after mainClass = Utils.classForName(childMainClass) at https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L695 I see that the exceptions are being printed instead of being thrown/sent to the listeners. The API says that startApplication is preferred but the various failures need to be sent via the handlers otherwise the listener API is not useful. Another case where failures are not sent via the Launcher API https://issues.apache.org/jira/browse/SPARK-17742 > If Spark Launcher fails to startApplication then handle's state does not > change > --- > > Key: SPARK-18241 > URL: https://issues.apache.org/jira/browse/SPARK-18241 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I am using Spark 2.0.0. I am using > https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/launcher/SparkLauncher.html > to submit my job. > If there is a failure after launcher's startapplication has been called but > before the spark job has actually started (i.e. in starting the spark process > that submits the job itself) there is > * no exception in the main thread that is submitting the job > * no exception in the job as it has not started > * no state change of the launcher > * the exception is logged in the error stream on the default logger name that > spark produces using the Job's main class. > Basically, it is not possible to catch an exception if it happens during that > time. The easiest way to reproduce it is to delete the JAR file or use an > invalid spark home while launching the job using sparkLauncher. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18241) If Spark Launcher fails to startApplication then handle's state does not change
Aseem Bansal created SPARK-18241: Summary: If Spark Launcher fails to startApplication then handle's state does not change Key: SPARK-18241 URL: https://issues.apache.org/jira/browse/SPARK-18241 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 2.0.0 Reporter: Aseem Bansal I am using Spark 2.0.0. I am using https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/launcher/SparkLauncher.html to submit my job. If there is a failure after launcher's startapplication has been called but before the spark job has actually started (i.e. in starting the spark process that submits the job itself) there is * no exception in the main thread that is submitting the job * no exception in the job as it has not started * no state change of the launcher * the exception is logged in the error stream on the default logger name that spark produces using the Job's main class. Basically, it is not possible to catch an exception if it happens during that time. The easiest way to reproduce it is to delete the JAR file or use an invalid spark home while launching the job using sparkLauncher. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17742) Spark Launcher does not get failed state in Listener
[ https://issues.apache.org/jira/browse/SPARK-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15535315#comment-15535315 ] Aseem Bansal edited comment on SPARK-17742 at 9/30/16 7:35 AM: --- I dug into the launcher code to see if I can figure out how it is working and see if I could find the bug. But when I reached LauncherServer's ServerConnection's handle method and found that this is socket programming I found it harder to find where the messages are coming from. Still trying to figure out but maybe someone who knows spark code better will find it easier to find the bug. was (Author: anshbansal): I dug into the launcher code to see if I can figure out how it is working and see if I could find the bug. But when I reached LauncherServer's ServerConnection's handle method and found that this is socket programming I found it harder to find where the messages are coming from. Still trying to figure out maybe someone who knows spark code better will find it easier to find the bug. > Spark Launcher does not get failed state in Listener > - > > Key: SPARK-17742 > URL: https://issues.apache.org/jira/browse/SPARK-17742 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I tried to launch an application using the below code. This is dummy code to > reproduce the problem. I tried exiting spark with status -1, throwing an > exception etc. but in no case did the listener give me failed status. But if > a spark job returns -1 or throws an exception from the main method it should > be considered as a failure. > {code} > package com.example; > import org.apache.spark.launcher.SparkAppHandle; > import org.apache.spark.launcher.SparkLauncher; > import java.io.IOException; > public class Main2 { > public static void main(String[] args) throws IOException, > InterruptedException { > SparkLauncher launcher = new SparkLauncher() > .setSparkHome("/opt/spark2") > > .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar") > .setMainClass("com.example.Main") > .setMaster("local[2]"); > launcher.startApplication(new MyListener()); > Thread.sleep(1000 * 60); > } > } > class MyListener implements SparkAppHandle.Listener { > @Override > public void stateChanged(SparkAppHandle handle) { > System.out.println("state changed " + handle.getState()); > } > @Override > public void infoChanged(SparkAppHandle handle) { > System.out.println("info changed " + handle.getState()); > } > } > {code} > The spark job is > {code} > package com.example; > import org.apache.spark.sql.SparkSession; > import java.io.IOException; > public class Main { > public static void main(String[] args) throws IOException { > SparkSession sparkSession = SparkSession > .builder() > .appName("" + System.currentTimeMillis()) > .getOrCreate(); > try { > for (int i = 0; i < 15; i++) { > Thread.sleep(1000); > System.out.println("sleeping 1"); > } > } catch (InterruptedException e) { > e.printStackTrace(); > } > //sparkSession.stop(); > System.exit(-1); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17742) Spark Launcher does not get failed state in Listener
[ https://issues.apache.org/jira/browse/SPARK-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15535315#comment-15535315 ] Aseem Bansal commented on SPARK-17742: -- I dug into the launcher code to see if I can figure out how it is working and see if I could find the bug. But when I reached LauncherServer's ServerConnection's handle method and found that this is socket programming I found it harder to find where the messages are coming from. Still trying to figure out maybe someone who knows spark code better will find it easier to find the bug. > Spark Launcher does not get failed state in Listener > - > > Key: SPARK-17742 > URL: https://issues.apache.org/jira/browse/SPARK-17742 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I tried to launch an application using the below code. This is dummy code to > reproduce the problem. I tried exiting spark with status -1, throwing an > exception etc. but in no case did the listener give me failed status. But if > a spark job returns -1 or throws an exception from the main method it should > be considered as a failure. > {code} > package com.example; > import org.apache.spark.launcher.SparkAppHandle; > import org.apache.spark.launcher.SparkLauncher; > import java.io.IOException; > public class Main2 { > public static void main(String[] args) throws IOException, > InterruptedException { > SparkLauncher launcher = new SparkLauncher() > .setSparkHome("/opt/spark2") > > .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar") > .setMainClass("com.example.Main") > .setMaster("local[2]"); > launcher.startApplication(new MyListener()); > Thread.sleep(1000 * 60); > } > } > class MyListener implements SparkAppHandle.Listener { > @Override > public void stateChanged(SparkAppHandle handle) { > System.out.println("state changed " + handle.getState()); > } > @Override > public void infoChanged(SparkAppHandle handle) { > System.out.println("info changed " + handle.getState()); > } > } > {code} > The spark job is > {code} > package com.example; > import org.apache.spark.sql.SparkSession; > import java.io.IOException; > public class Main { > public static void main(String[] args) throws IOException { > SparkSession sparkSession = SparkSession > .builder() > .appName("" + System.currentTimeMillis()) > .getOrCreate(); > try { > for (int i = 0; i < 15; i++) { > Thread.sleep(1000); > System.out.println("sleeping 1"); > } > } catch (InterruptedException e) { > e.printStackTrace(); > } > //sparkSession.stop(); > System.exit(-1); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17742) Spark Launcher does not get failed state in Listener
Aseem Bansal created SPARK-17742: Summary: Spark Launcher does not get failed state in Listener Key: SPARK-17742 URL: https://issues.apache.org/jira/browse/SPARK-17742 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 2.0.0 Reporter: Aseem Bansal I tried to launch an application using the below code. This is dummy code to reproduce the problem. I tried exiting spark with status -1, throwing an exception etc. but in no case did the listener give me failed status. But if a spark job returns -1 or throws an exception from the main method it should be considered as a failure. {code} package com.example; import org.apache.spark.launcher.SparkAppHandle; import org.apache.spark.launcher.SparkLauncher; import java.io.IOException; public class Main2 { public static void main(String[] args) throws IOException, InterruptedException { SparkLauncher launcher = new SparkLauncher() .setSparkHome("/opt/spark2") .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar") .setMainClass("com.example.Main") .setMaster("local[2]"); launcher.startApplication(new MyListener()); Thread.sleep(1000 * 60); } } class MyListener implements SparkAppHandle.Listener { @Override public void stateChanged(SparkAppHandle handle) { System.out.println("state changed " + handle.getState()); } @Override public void infoChanged(SparkAppHandle handle) { System.out.println("info changed " + handle.getState()); } } {code} The spark job is {code} package com.example; import org.apache.spark.sql.SparkSession; import java.io.IOException; public class Main { public static void main(String[] args) throws IOException { SparkSession sparkSession = SparkSession .builder() .appName("" + System.currentTimeMillis()) .getOrCreate(); try { for (int i = 0; i < 15; i++) { Thread.sleep(1000); System.out.println("sleeping 1"); } } catch (InterruptedException e) { e.printStackTrace(); } //sparkSession.stop(); System.exit(-1); } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17560) SQLContext tables returns table names in lower case only
[ https://issues.apache.org/jira/browse/SPARK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495960#comment-15495960 ] Aseem Bansal commented on SPARK-17560: -- Can you share where this option needs to be set? Maybe I can try and add a pull request unless it is easier for you to just add a PR yourself instead of explaining. > SQLContext tables returns table names in lower case only > > > Key: SPARK-17560 > URL: https://issues.apache.org/jira/browse/SPARK-17560 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I registered a table using > dataSet.createOrReplaceTempView("TestTable"); > Then I tried to get the list of tables using > sparkSession.sqlContext().tableNames() > but the name that I got was testtable. It used to give table names in proper > case in Spark 1.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17560) SQLContext tables returns table names in lower case only
[ https://issues.apache.org/jira/browse/SPARK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495906#comment-15495906 ] Aseem Bansal commented on SPARK-17560: -- Looked through https://spark.apache.org/docs/2.0.0/sql-programming-guide.html https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/Dataset.html https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/SparkSession.html https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/SparkConf.html and none of them say anything about this parameter > SQLContext tables returns table names in lower case only > > > Key: SPARK-17560 > URL: https://issues.apache.org/jira/browse/SPARK-17560 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I registered a table using > dataSet.createOrReplaceTempView("TestTable"); > Then I tried to get the list of tables using > sparkSession.sqlContext().tableNames() > but the name that I got was testtable. It used to give table names in proper > case in Spark 1.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17561) DataFrameWriter documentation formatting problems
[ https://issues.apache.org/jira/browse/SPARK-17561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated SPARK-17561: - Description: I visited this page https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameWriter.html and saw that the docs have formatting problems !screenshot-1.png! was: I visited this page https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameWriter.html and saw that the docs have formatting problems > DataFrameWriter documentation formatting problems > - > > Key: SPARK-17561 > URL: https://issues.apache.org/jira/browse/SPARK-17561 > Project: Spark > Issue Type: Documentation >Reporter: Aseem Bansal > Attachments: screenshot-1.png > > > I visited this page > https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameWriter.html > and saw that the docs have formatting problems > !screenshot-1.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17561) DataFrameWriter documentation formatting problems
[ https://issues.apache.org/jira/browse/SPARK-17561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated SPARK-17561: - Description: I visited this page https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameWriter.html and saw that the docs have formatting problems !screenshot-1.png! Tried with browser cache disabled. Same issue was: I visited this page https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameWriter.html and saw that the docs have formatting problems !screenshot-1.png! > DataFrameWriter documentation formatting problems > - > > Key: SPARK-17561 > URL: https://issues.apache.org/jira/browse/SPARK-17561 > Project: Spark > Issue Type: Documentation >Reporter: Aseem Bansal > Attachments: screenshot-1.png > > > I visited this page > https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameWriter.html > and saw that the docs have formatting problems > !screenshot-1.png! > Tried with browser cache disabled. Same issue -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17561) DataFrameWriter documentation formatting problems
[ https://issues.apache.org/jira/browse/SPARK-17561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated SPARK-17561: - Attachment: screenshot-1.png > DataFrameWriter documentation formatting problems > - > > Key: SPARK-17561 > URL: https://issues.apache.org/jira/browse/SPARK-17561 > Project: Spark > Issue Type: Documentation >Reporter: Aseem Bansal > Attachments: screenshot-1.png > > > I visited this page > https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameWriter.html > and saw that the docs have formatting problems -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17561) DataFrameWriter documentation formatting problems
Aseem Bansal created SPARK-17561: Summary: DataFrameWriter documentation formatting problems Key: SPARK-17561 URL: https://issues.apache.org/jira/browse/SPARK-17561 Project: Spark Issue Type: Documentation Reporter: Aseem Bansal Attachments: screenshot-1.png I visited this page https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameWriter.html and saw that the docs have formatting problems -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17560) SQLContext tables returns table names in lower case only
[ https://issues.apache.org/jira/browse/SPARK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495862#comment-15495862 ] Aseem Bansal commented on SPARK-17560: -- No I did not. Where? > SQLContext tables returns table names in lower case only > > > Key: SPARK-17560 > URL: https://issues.apache.org/jira/browse/SPARK-17560 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I registered a table using > dataSet.createOrReplaceTempView("TestTable"); > Then I tried to get the list of tables using > sparkSession.sqlContext().tableNames() > but the name that I got was testtable. It used to give table names in proper > case in Spark 1.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17560) SQLContext tables returns table names in lower case only
[ https://issues.apache.org/jira/browse/SPARK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495862#comment-15495862 ] Aseem Bansal edited comment on SPARK-17560 at 9/16/16 9:38 AM: --- No I did not. Where? Had not set that in Spark 1.4 either was (Author: anshbansal): No I did not. Where? > SQLContext tables returns table names in lower case only > > > Key: SPARK-17560 > URL: https://issues.apache.org/jira/browse/SPARK-17560 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I registered a table using > dataSet.createOrReplaceTempView("TestTable"); > Then I tried to get the list of tables using > sparkSession.sqlContext().tableNames() > but the name that I got was testtable. It used to give table names in proper > case in Spark 1.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17560) SQLContext tables returns table names in lower case only
Aseem Bansal created SPARK-17560: Summary: SQLContext tables returns table names in lower case only Key: SPARK-17560 URL: https://issues.apache.org/jira/browse/SPARK-17560 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Reporter: Aseem Bansal I registered a table using dataSet.createOrReplaceTempView("TestTable"); Then I tried to get the list of tables using sparkSession.sqlContext().tableNames() but the name that I got was testtable. It used to give table names in proper case in Spark 1.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17307) Document what all access is needed on S3 bucket when trying to save a model
[ https://issues.apache.org/jira/browse/SPARK-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1540#comment-1540 ] Aseem Bansal commented on SPARK-17307: -- Not adding it there would be fine. But there needs to be something. Also for contributions I tried searching for the file but could not. In which branch are you working? > Document what all access is needed on S3 bucket when trying to save a model > --- > > Key: SPARK-17307 > URL: https://issues.apache.org/jira/browse/SPARK-17307 > Project: Spark > Issue Type: Documentation >Reporter: Aseem Bansal >Priority: Minor > > I faced this lack of documentation when I was trying to save a model to S3. > Initially I thought it should be only write. Then I found it also needs > delete to delete temporary files. Now I requested access for delete and tried > again and I am get the error > Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: > org.jets3t.service.S3ServiceException: S3 PUT failed for > '/dev-qa_%24folder%24' XML Error Message > To reproduce this error the below can be used > {code} > SparkSession sparkSession = SparkSession > .builder() > .appName("my app") > .master("local") > .getOrCreate(); > JavaSparkContext jsc = new > JavaSparkContext(sparkSession.sparkContext()); > jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", ); > jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", ACCESS KEY>); > //Create a Pipelinemode > > pipelineModel.write().overwrite().save("s3n:///dev-qa/modelTest"); > {code} > This back and forth could be avoided if it was clearly mentioned what all > access spark needs to write to S3. Also would be great if why all of the > access is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17307) Document what all access is needed on S3 bucket when trying to save a model
[ https://issues.apache.org/jira/browse/SPARK-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15454791#comment-15454791 ] Aseem Bansal commented on SPARK-17307: -- I would add that bit of information at http://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/util/MLWritable.html#save(java.lang.String) Something like it needs complete read write access when using with S3 should be enough. > Document what all access is needed on S3 bucket when trying to save a model > --- > > Key: SPARK-17307 > URL: https://issues.apache.org/jira/browse/SPARK-17307 > Project: Spark > Issue Type: Documentation >Reporter: Aseem Bansal >Priority: Minor > > I faced this lack of documentation when I was trying to save a model to S3. > Initially I thought it should be only write. Then I found it also needs > delete to delete temporary files. Now I requested access for delete and tried > again and I am get the error > Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: > org.jets3t.service.S3ServiceException: S3 PUT failed for > '/dev-qa_%24folder%24' XML Error Message > To reproduce this error the below can be used > {code} > SparkSession sparkSession = SparkSession > .builder() > .appName("my app") > .master("local") > .getOrCreate(); > JavaSparkContext jsc = new > JavaSparkContext(sparkSession.sparkContext()); > jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", ); > jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", ACCESS KEY>); > //Create a Pipelinemode > > pipelineModel.write().overwrite().save("s3n:///dev-qa/modelTest"); > {code} > This back and forth could be avoided if it was clearly mentioned what all > access spark needs to write to S3. Also would be great if why all of the > access is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17307) Document what all access is needed on S3 bucket when trying to save a model
Aseem Bansal created SPARK-17307: Summary: Document what all access is needed on S3 bucket when trying to save a model Key: SPARK-17307 URL: https://issues.apache.org/jira/browse/SPARK-17307 Project: Spark Issue Type: Documentation Reporter: Aseem Bansal I faced this lack of documentation when I was trying to save a model to S3. Initially I thought it should be only write. Then I found it also needs delete to delete temporary files. Now I requested access for delete and tried again and I am get the error Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 PUT failed for '/dev-qa_%24folder%24' XML Error Message To reproduce this error the below can be used {code} SparkSession sparkSession = SparkSession .builder() .appName("my app") .master("local") .getOrCreate(); JavaSparkContext jsc = new JavaSparkContext(sparkSession.sparkContext()); jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", ); jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", ); //Create a Pipelinemode pipelineModel.write().overwrite().save("s3n:///dev-qa/modelTest"); {code} This back and forth could be avoided if it was clearly mentioned what all access spark needs to write to S3. Also would be great if why all of the access is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17012) Reading data frames via CSV - Allow to specify default value for integers
[ https://issues.apache.org/jira/browse/SPARK-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated SPARK-17012: - Description: Currently the option that we have in DataFrameReader is nullValue which allows us one default. But say in our data frame we have string and integers and we want to specify the default for strings and integers differently that is currently not possible. If it is done for different data types then it should be possible to allow to specify the schema to be nullable false when inferring schema (as a new option). was:Currently the option that we have in DataFrameReader is nullValue which allows us one default. But say in our data frame we have string and integers and we want to specify the default for strings and integers differently that is currently not possible. > Reading data frames via CSV - Allow to specify default value for integers > - > > Key: SPARK-17012 > URL: https://issues.apache.org/jira/browse/SPARK-17012 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > Currently the option that we have in DataFrameReader is nullValue which > allows us one default. But say in our data frame we have string and integers > and we want to specify the default for strings and integers differently that > is currently not possible. > If it is done for different data types then it should be possible to allow to > specify the schema to be nullable false when inferring schema (as a new > option). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17012) Reading data frames via CSV - Allow to specify default value for integers
Aseem Bansal created SPARK-17012: Summary: Reading data frames via CSV - Allow to specify default value for integers Key: SPARK-17012 URL: https://issues.apache.org/jira/browse/SPARK-17012 Project: Spark Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Aseem Bansal Currently the option that we have in DataFrameReader is nullValue which allows us one default. But say in our data frame we have string and integers and we want to specify the default for strings and integers differently that is currently not possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16893) Spark CSV Provider option is not documented
[ https://issues.apache.org/jira/browse/SPARK-16893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409260#comment-15409260 ] Aseem Bansal commented on SPARK-16893: -- Yes. I would expect it to work without the use of format function as spark's documentation does not tell me anything about the need to use the format when using the csv function. > Spark CSV Provider option is not documented > --- > > Key: SPARK-16893 > URL: https://issues.apache.org/jira/browse/SPARK-16893 > Project: Spark > Issue Type: Documentation >Affects Versions: 2.0.0 >Reporter: Aseem Bansal >Priority: Minor > > I was working with databricks spark csv library and came across an error. I > have logged the issue in their github but it would be good to document that > in Apache Spark's documentation also > I faced it with CSV. Someone else faced that with JSON > http://stackoverflow.com/questions/38761920/spark2-0-error-multiple-sources-found-for-json-when-read-json-file > Complete Issue details here > https://github.com/databricks/spark-csv/issues/367 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16893) Spark CSV Provider option is not documented
[ https://issues.apache.org/jira/browse/SPARK-16893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409183#comment-15409183 ] Aseem Bansal commented on SPARK-16893: -- Reading a CSV causes an exception. Code used and excpetion are below. Also present in the github issue that I have referenced here. {code} public static void main(String[] args) { SparkSession spark = SparkSession .builder() .appName("my app") .getOrCreate(); Dataset df = spark.read() .format("com.databricks.spark.csv") .option("header", "true") .option("nullValue", "") .csv("/home/aseem/data.csv") ; df.show(); } {code} bq. Exception in thread "main" java.lang.RuntimeException: Multiple sources found for csv (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat, com.databricks.spark.csv.DefaultSource15), please specify the fully qualified class name. People need to use format("csv"). I think that is counter intuitive seeing that I am using the CSV method. > Spark CSV Provider option is not documented > --- > > Key: SPARK-16893 > URL: https://issues.apache.org/jira/browse/SPARK-16893 > Project: Spark > Issue Type: Documentation >Affects Versions: 2.0.0 >Reporter: Aseem Bansal >Priority: Minor > > I was working with databricks spark csv library and came across an error. I > have logged the issue in their github but it would be good to document that > in Apache Spark's documentation also > I faced it with CSV. Someone else faced that with JSON > http://stackoverflow.com/questions/38761920/spark2-0-error-multiple-sources-found-for-json-when-read-json-file > Complete Issue details here > https://github.com/databricks/spark-csv/issues/367 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16895) Reading empty string from csv has changed behaviour
[ https://issues.apache.org/jira/browse/SPARK-16895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408892#comment-15408892 ] Aseem Bansal edited comment on SPARK-16895 at 8/5/16 5:19 AM: -- I see that this is duplicate. Regarding it being a bug or not I heard someone say this related to frameworks. > If a feature is not documented it does not exist. If a change is not > documented then it is a bug. was (Author: anshbansal): I understand that it is duplicate. Regarding it being a bug or not I heard someone say this. > If a feature is not documented it does not exist. If a change is not > documented then it is a bug. > Reading empty string from csv has changed behaviour > --- > > Key: SPARK-16895 > URL: https://issues.apache.org/jira/browse/SPARK-16895 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I have a file called test.csv > "a" > "" > When I read it in Spark 1.4 I get an empty string as value. When I read it in > 2.0 I get "null" as the String. > The testing code is same as mentioned at > https://github.com/databricks/spark-csv/issues/367 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16895) Reading empty string from csv has changed behaviour
[ https://issues.apache.org/jira/browse/SPARK-16895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408892#comment-15408892 ] Aseem Bansal commented on SPARK-16895: -- I understand that it is duplicate. Regarding it being a bug or not I heard someone say this. > If a feature is not documented it does not exist. If a change is not > documented then it is a bug. > Reading empty string from csv has changed behaviour > --- > > Key: SPARK-16895 > URL: https://issues.apache.org/jira/browse/SPARK-16895 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I have a file called test.csv > "a" > "" > When I read it in Spark 1.4 I get an empty string as value. When I read it in > 2.0 I get "null" as the String. > The testing code is same as mentioned at > https://github.com/databricks/spark-csv/issues/367 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16896) Loading csv with duplicate column names
Aseem Bansal created SPARK-16896: Summary: Loading csv with duplicate column names Key: SPARK-16896 URL: https://issues.apache.org/jira/browse/SPARK-16896 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Reporter: Aseem Bansal It would be great if the library allows us to load csv with duplicate column names. I understand that having duplicate columns in the data is odd but sometimes we get data that has duplicate columns. Getting upstream data like that can happen. We may choose to ignore them but currently there is no way to drop those as we are not able to load them at all. Currently as a pre-processing I loaded the data into R, changed the column names and then make a fixed version with which Spark Java API can work. But if talk about other options, e.g. R has read.csv which automatically takes care of such situation by appending a number to the column name. Also case sensitivity in column names can also cause problems. I mean if we have columns like ColumnName, columnName I may want to have them as separate. But the option to do this is not documented. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16896) Loading csv with duplicate column names
[ https://issues.apache.org/jira/browse/SPARK-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407604#comment-15407604 ] Aseem Bansal commented on SPARK-16896: -- [~hyukjin.kwon] cc > Loading csv with duplicate column names > --- > > Key: SPARK-16896 > URL: https://issues.apache.org/jira/browse/SPARK-16896 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > It would be great if the library allows us to load csv with duplicate column > names. I understand that having duplicate columns in the data is odd but > sometimes we get data that has duplicate columns. Getting upstream data like > that can happen. We may choose to ignore them but currently there is no way > to drop those as we are not able to load them at all. Currently as a > pre-processing I loaded the data into R, changed the column names and then > make a fixed version with which Spark Java API can work. > But if talk about other options, e.g. R has read.csv which automatically > takes care of such situation by appending a number to the column name. > Also case sensitivity in column names can also cause problems. I mean if we > have columns like > ColumnName, columnName > I may want to have them as separate. But the option to do this is not > documented. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16893) Spark CSV Provider option is not documented
[ https://issues.apache.org/jira/browse/SPARK-16893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407601#comment-15407601 ] Aseem Bansal commented on SPARK-16893: -- [~hyukjin.kwon] cc > Spark CSV Provider option is not documented > --- > > Key: SPARK-16893 > URL: https://issues.apache.org/jira/browse/SPARK-16893 > Project: Spark > Issue Type: Documentation >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I was working with databricks spark csv library and came across an error. I > have logged the issue in their github but it would be good to document that > in Apache Spark's documentation also > I faced it with CSV. Someone else faced that with JSON > http://stackoverflow.com/questions/38761920/spark2-0-error-multiple-sources-found-for-json-when-read-json-file > Complete Issue details here > https://github.com/databricks/spark-csv/issues/367 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16895) Reading empty string from csv has changed behaviour
[ https://issues.apache.org/jira/browse/SPARK-16895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407602#comment-15407602 ] Aseem Bansal commented on SPARK-16895: -- [~hyukjin.kwon] cc > Reading empty string from csv has changed behaviour > --- > > Key: SPARK-16895 > URL: https://issues.apache.org/jira/browse/SPARK-16895 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I have a file called test.csv > "a" > "" > When I read it in Spark 1.4 I get an empty string as value. When I read it in > 2.0 I get "null" as the String. > The testing code is same as mentioned at > https://github.com/databricks/spark-csv/issues/367 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16895) Reading empty string from csv has changed behaviour
Aseem Bansal created SPARK-16895: Summary: Reading empty string from csv has changed behaviour Key: SPARK-16895 URL: https://issues.apache.org/jira/browse/SPARK-16895 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Reporter: Aseem Bansal I have a file called test.csv "a" "" When I read it in Spark 1.4 I get an empty string as value. When I read it in 2.0 I get "null" as the String. The testing code is same as mentioned at https://github.com/databricks/spark-csv/issues/367 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16893) Spark CSV Provider option is not documented
[ https://issues.apache.org/jira/browse/SPARK-16893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated SPARK-16893: - Description: I was working with databricks spark csv library and came across an error. I have logged the issue in their github but it would be good to document that in Apache Spark's documentation also I faced it with CSV. Someone else faced that with JSON http://stackoverflow.com/questions/38761920/spark2-0-error-multiple-sources-found-for-json-when-read-json-file Complete Issue details here https://github.com/databricks/spark-csv/issues/367 was: I was working with databricks spark csv library and came across an error. I have logged the issue in their github but it would be good to document that in Apache Spark's documentation also Details here https://github.com/databricks/spark-csv/issues/367 > Spark CSV Provider option is not documented > --- > > Key: SPARK-16893 > URL: https://issues.apache.org/jira/browse/SPARK-16893 > Project: Spark > Issue Type: Documentation >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I was working with databricks spark csv library and came across an error. I > have logged the issue in their github but it would be good to document that > in Apache Spark's documentation also > I faced it with CSV. Someone else faced that with JSON > http://stackoverflow.com/questions/38761920/spark2-0-error-multiple-sources-found-for-json-when-read-json-file > Complete Issue details here > https://github.com/databricks/spark-csv/issues/367 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16893) Spark CSV Provider option is not documented
Aseem Bansal created SPARK-16893: Summary: Spark CSV Provider option is not documented Key: SPARK-16893 URL: https://issues.apache.org/jira/browse/SPARK-16893 Project: Spark Issue Type: Documentation Affects Versions: 2.0.0 Reporter: Aseem Bansal I was working with databricks spark csv library and came across an error. I have logged the issue in their github but it would be good to document that in Apache Spark's documentation also Details here https://github.com/databricks/spark-csv/issues/367 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (GROOVY-7727) Cannot create unicode sequences using \Uxxxxxx
[ https://issues.apache.org/jira/browse/GROOVY-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123730#comment-15123730 ] Aseem Bansal commented on GROOVY-7727: -- I tried your code in Java. \Uxxx failed in java also https://ideone.com/W99GOs And that line says "provides a code point input method which accepts strings". So there is a specific method which accepts this format. So this is not a bug. If you find the method name please do share. > Cannot create unicode sequences using \Uxx > -- > > Key: GROOVY-7727 > URL: https://issues.apache.org/jira/browse/GROOVY-7727 > Project: Groovy > Issue Type: Bug > Components: Compiler >Affects Versions: 2.4.5 >Reporter: Andres Almiray > > According to > http://www.oracle.com/technetwork/articles/java/supplementary-142654.html > "For text input, the Java 2 SDK provides a code point input method which > accepts strings of the form "\Uxx", where the uppercase "U" indicates > that the escape sequence contains six hexadecimal digits, thus allowing for > supplementary characters. A lowercase "u" indicates the original form of the > escape sequences, "\u". You can find this input method and its > documentation in the directory demo/jfc/CodePointIM of the J2SDK." > The following code fails with a syntax exception > s = "\U01f5d0" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GROOVY-7625) Slashy string in groovy allows brackets but double quoted string does not. Why?
[ https://issues.apache.org/jira/browse/GROOVY-7625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107056#comment-15107056 ] Aseem Bansal commented on GROOVY-7625: -- PR for this https://github.com/apache/groovy/pull/243 > Slashy string in groovy allows brackets but double quoted string does not. > Why? > --- > > Key: GROOVY-7625 > URL: https://issues.apache.org/jira/browse/GROOVY-7625 > Project: Groovy > Issue Type: Documentation >Reporter: Aseem Bansal >Priority: Minor > > This > println("$()") > gives me a compiler error "Either escape a dollar sign or bracket the value > expression" > But this > println(/$()/) > prints `$()` fine. No errors > Why is there a difference? The only documented difference is that slashy > strings make working with backslashes easier. I understand that a variable > name cannot start with a bracket so it should be possible to make that > special case. Is that the case for the slashy strings? > Just came across this when doing something with regex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GROOVY-7625) Slashy string in groovy allows brackets but double quoted string does not. Why?
Aseem Bansal created GROOVY-7625: Summary: Slashy string in groovy allows brackets but double quoted string does not. Why? Key: GROOVY-7625 URL: https://issues.apache.org/jira/browse/GROOVY-7625 Project: Groovy Issue Type: Documentation Reporter: Aseem Bansal Priority: Minor This println("$()") gives me a compiler error "Either escape a dollar sign or bracket the value expression" But this println(/$()/) prints `$()` fine. No errors Why is there a difference? The only documented difference is that slashy strings make working with backslashes easier. I understand that a variable name cannot start with a bracket so it should be possible to make that special case. Is that the case for the slashy strings? Just came across this when doing something with regex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GROOVY-7605) Improve docs for MetaClass getMethods vs getMetaMethods
Aseem Bansal created GROOVY-7605: Summary: Improve docs for MetaClass getMethods vs getMetaMethods Key: GROOVY-7605 URL: https://issues.apache.org/jira/browse/GROOVY-7605 Project: Groovy Issue Type: Documentation Reporter: Aseem Bansal The current explanation at http://docs.groovy-lang.org/latest/html/api/groovy/lang/MetaClass.html is not clear. I know that there is an explanation at http://www.groovy-lang.org/mailing-lists.html#nabble-td388327 but it just shows that Graeme added a method. I am guessing he added getMetaMethods. As far as I can tell by running them getMethods is giving non-meta methods while getMetaMethods is only giving the meta. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GROOVY-7603) Update groovy docs for Category
[ https://issues.apache.org/jira/browse/GROOVY-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated GROOVY-7603: - Description: Category docs refer to @Mixin but they are deprecated http://docs.groovy-lang.org/latest/html/gapi/groovy/lang/Category.html I found that using traits is not possible. But it would be nice to be able to use them. Tried a workaround but didn't work {noformat} trait Util { Number getTwice() { this * 2 } Number max(Number otherNumber) { Math.max(this, otherNumber) } } @groovy.lang.Category(Number) abstract class UtilCategory implements Util { } {noformat} was: Category docs refer to @Mixin but they are deprecated http://docs.groovy-lang.org/latest/html/gapi/groovy/lang/Category.html I found that using traits is not possible. But it would be nice to be able to use them. > Update groovy docs for Category > --- > > Key: GROOVY-7603 > URL: https://issues.apache.org/jira/browse/GROOVY-7603 > Project: Groovy > Issue Type: Documentation >Reporter: Aseem Bansal > > Category docs refer to @Mixin but they are deprecated > http://docs.groovy-lang.org/latest/html/gapi/groovy/lang/Category.html > I found that using traits is not possible. But it would be nice to be able to > use them. > Tried a workaround but didn't work > {noformat} > trait Util { > Number getTwice() { this * 2 } > Number max(Number otherNumber) { Math.max(this, otherNumber) } > } > @groovy.lang.Category(Number) > abstract class UtilCategory implements Util { > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GROOVY-7603) Update groovy docs for Category
[ https://issues.apache.org/jira/browse/GROOVY-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated GROOVY-7603: - Description: Category docs refer to @Mixin but they are deprecated http://docs.groovy-lang.org/latest/html/gapi/groovy/lang/Category.html I found that using traits is not possible. But it would be nice to be able to use them. was: Category docs refer to @Mixin but they are deprecated http://docs.groovy-lang.org/latest/html/gapi/groovy/lang/Category.html Can traits be used? > Update groovy docs for Category > --- > > Key: GROOVY-7603 > URL: https://issues.apache.org/jira/browse/GROOVY-7603 > Project: Groovy > Issue Type: Documentation >Reporter: Aseem Bansal > > Category docs refer to @Mixin but they are deprecated > http://docs.groovy-lang.org/latest/html/gapi/groovy/lang/Category.html > I found that using traits is not possible. But it would be nice to be able to > use them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (GROOVY-7604) traits docs diamond problem explanation
[ https://issues.apache.org/jira/browse/GROOVY-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908937#comment-14908937 ] Aseem Bansal edited comment on GROOVY-7604 at 9/26/15 12:42 AM: I read further and found the section "Default conflict resolution" which gives the correct explanation. But the earlier one should be corrected/refer to this section. was (Author: anshbansal): I read further and found the section "Default conflict resolution" which gives the correct explanation > traits docs diamond problem explanation > --- > > Key: GROOVY-7604 > URL: https://issues.apache.org/jira/browse/GROOVY-7604 > Project: Groovy > Issue Type: Documentation >Reporter: Aseem Bansal > > http://www.groovy-lang.org/objectorientation.html#_composition_of_behaviors > has an example after referring diamond problem. Is it correct example for > diamond problem? Shouldn't the method names be the same? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GROOVY-7604) traits docs diamond problem explanation
[ https://issues.apache.org/jira/browse/GROOVY-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908937#comment-14908937 ] Aseem Bansal commented on GROOVY-7604: -- I read further and found the section "Default conflict resolution" which gives the correct explanation > traits docs diamond problem explanation > --- > > Key: GROOVY-7604 > URL: https://issues.apache.org/jira/browse/GROOVY-7604 > Project: Groovy > Issue Type: Documentation >Reporter: Aseem Bansal > > http://www.groovy-lang.org/objectorientation.html#_composition_of_behaviors > has an example after referring diamond problem. Is it correct example for > diamond problem? Shouldn't the method names be the same? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GROOVY-7604) traits docs diamond problem explanation
Aseem Bansal created GROOVY-7604: Summary: traits docs diamond problem explanation Key: GROOVY-7604 URL: https://issues.apache.org/jira/browse/GROOVY-7604 Project: Groovy Issue Type: Documentation Reporter: Aseem Bansal http://www.groovy-lang.org/objectorientation.html#_composition_of_behaviors has an example after referring diamond problem. Is it correct example for diamond problem? Shouldn't the method names be the same? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GROOVY-7603) Update groovy docs for Category
Aseem Bansal created GROOVY-7603: Summary: Update groovy docs for Category Key: GROOVY-7603 URL: https://issues.apache.org/jira/browse/GROOVY-7603 Project: Groovy Issue Type: Documentation Reporter: Aseem Bansal Category docs refer to @Mixin but they are deprecated http://docs.groovy-lang.org/latest/html/gapi/groovy/lang/Category.html Can traits be used? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GROOVY-7592) Problem in switch statement docs
[ https://issues.apache.org/jira/browse/GROOVY-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901833#comment-14901833 ] Aseem Bansal commented on GROOVY-7592: -- [~pascalschumacher] Did you fix the docs? Because I remember that there is another place where that is written. Don't remember exactly where. > Problem in switch statement docs > > > Key: GROOVY-7592 > URL: https://issues.apache.org/jira/browse/GROOVY-7592 > Project: Groovy > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.4.5 >Reporter: Aseem Bansal >Assignee: Pascal Schumacher >Priority: Minor > Fix For: 2.4.6 > > > As per http://www.groovy-lang.org/semantics.html default must be the last > thing in switch case. > Based on that I sent a PR which has been accepted > https://github.com/apache/incubator-groovy/pull/82 > But when I tried to use default somewhere else it worked fine > {noformat} > String str = "aseem" > switch(str) { > default: > println "default" > break > case "aseem": > println "Aseem" > break > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GROOVY-7592) Problem in switch statement docs
Aseem Bansal created GROOVY-7592: Summary: Problem in switch statement docs Key: GROOVY-7592 URL: https://issues.apache.org/jira/browse/GROOVY-7592 Project: Groovy Issue Type: Documentation Reporter: Aseem Bansal As per http://www.groovy-lang.org/semantics.html default must be the last thing in switch case. Based on that I sent a PR which has been accepted https://github.com/apache/incubator-groovy/pull/82 But when I tried to use default somewhere else it worked fine {noformat} String str = "aseem" switch(str) { default: println "default" break case "aseem": println "Aseem" break } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GROOVY-7578) Image present for metaprogramming is incorrect
[ https://issues.apache.org/jira/browse/GROOVY-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731978#comment-14731978 ] Aseem Bansal commented on GROOVY-7578: -- Thanks for the detailed response. Much appreciated. > Image present for metaprogramming is incorrect > -- > > Key: GROOVY-7578 > URL: https://issues.apache.org/jira/browse/GROOVY-7578 > Project: Groovy > Issue Type: Documentation >Reporter: Aseem Bansal > > I am reading groovy metaprogramming documentation and saw that the image > present is wrong. I mean there is an image present but the flow is wrong. > It has a block "Method exists in MetaClass or Class" two times. One after > GroovyInterceptable and other after the first "Method exists in MetaClass or > Class" . > I tried a simple program and as per this I believe the first block should > have class only and second block should have MetaClass only -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GROOVY-7578) Image present for metaprogramming is incorrect
[ https://issues.apache.org/jira/browse/GROOVY-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731832#comment-14731832 ] Aseem Bansal commented on GROOVY-7578: -- [~paulk] Isn't "Method exists in MetaClass or Class" also ambiguous? I mean which is checked first Class or metaclass? What happens if there is a hierarchy of classes? This is not covered and the information is not currently available anywhere due to which I asked a [question on stackoverflow|http://stackoverflow.com/questions/32405568/how-does-groovys-meta-object-protocol-work-in-case-of-hierarchy-of-meta-classes]. Can you answer that? > Image present for metaprogramming is incorrect > -- > > Key: GROOVY-7578 > URL: https://issues.apache.org/jira/browse/GROOVY-7578 > Project: Groovy > Issue Type: Documentation >Reporter: Aseem Bansal > > I am reading groovy metaprogramming documentation and saw that the image > present is wrong. I mean there is an image present but the flow is wrong. > It has a block "Method exists in MetaClass or Class" two times. One after > GroovyInterceptable and other after the first "Method exists in MetaClass or > Class" . > I tried a simple program and as per this I believe the first block should > have class only and second block should have MetaClass only -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GROOVY-7577) Groovy class docs all have ASF license as their explanation
[ https://issues.apache.org/jira/browse/GROOVY-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731368#comment-14731368 ] Aseem Bansal commented on GROOVY-7577: -- I noticed that If I change gapi to api the correct thing is shown. Is there a reason why there are two hosted? I landed at the gapi version via google search. > Groovy class docs all have ASF license as their explanation > --- > > Key: GROOVY-7577 > URL: https://issues.apache.org/jira/browse/GROOVY-7577 > Project: Groovy > Issue Type: Documentation >Affects Versions: 2.4.4 >Reporter: Aseem Bansal >Priority: Critical > > I opened one of groovy's API docs and noticed a weird thing. The only > explanation that the class had is ASF license. > e.g. > http://docs.groovy-lang.org/2.4.4/html/gapi/index.html?groovy/lang/GroovyObjectSupport.html > I checked the source at > https://github.com/apache/incubator-groovy/blob/master/src/main/groovy/lang/GroovyObjectSupport.java > and found that there is some actual explanation but that is not present in > the API docs. > So basically somehow the complete groovy API docs are missing a lot of > information. > I opened > http://docs.groovy-lang.org/2.4.4/html/gapi/groovy/lang/package-summary.html > and saw that the description column contains the ASF license. Certainly > nobody would expect that in an API documentation. > Now I am sure that this is not required by Apache software Foundation as I > cross-checked in other Apache projects and they had some actual explanation. > It would be great if this is taken care of because in my opinion this just > makes the API docs very bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GROOVY-7580) ExpandoMetaClass append method does not throw an exception as per docs
Aseem Bansal created GROOVY-7580: Summary: ExpandoMetaClass append method does not throw an exception as per docs Key: GROOVY-7580 URL: https://issues.apache.org/jira/browse/GROOVY-7580 Project: Groovy Issue Type: Bug Reporter: Aseem Bansal I was reading the docs when I came across "Note that the left shift operator is used to append a new method. If the method already exists an exception will be thrown." I decided to try it via the below program. There was no exception. I am using groovy 2.3.8 {noformat} class A { } A.metaClass.hello = { "hello superclass" } class B extends A { } B.metaClass.hello << { "hello subclass" } B.metaClass.hello << { "hello subclass" } new B().hello() {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GROOVY-7578) Image present for metaprogramming is incorrect
[ https://issues.apache.org/jira/browse/GROOVY-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731260#comment-14731260 ] Aseem Bansal commented on GROOVY-7578: -- Also a simpler reasoning for why that is incorrect is that only one thing can be checked - either class or metaclass. Not both of them can be checked at the same time. > Image present for metaprogramming is incorrect > -- > > Key: GROOVY-7578 > URL: https://issues.apache.org/jira/browse/GROOVY-7578 > Project: Groovy > Issue Type: Documentation >Reporter: Aseem Bansal > > I am reading groovy metaprogramming documentation and saw that the image > present is wrong. I mean there is an image present but the flow is wrong. > It has a block "Method exists in MetaClass or Class" two times. One after > GroovyInterceptable and other after the first "Method exists in MetaClass or > Class" . > I tried a simple program and as per this I believe the first block should > have class only and second block should have MetaClass only -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GROOVY-7579) Improve docs for invokeMethod
Aseem Bansal created GROOVY-7579: Summary: Improve docs for invokeMethod Key: GROOVY-7579 URL: https://issues.apache.org/jira/browse/GROOVY-7579 Project: Groovy Issue Type: Documentation Reporter: Aseem Bansal I was reading meta programming documentation when I noticed that "this method is called when the method you called is not present on a Groovy object" As per the diagram it is incorrect. It is invoked when methodMissing is not present. This statement is as per the diagram. Also as per the answer at this is not an appropriate example http://stackoverflow.com/questions/19220370/what-is-the-difference-between-invokemethod-and-methodmissing Saying this because the answer by blackdrag (who I understand is a core committer to groovy) says that methodMissing should be used instead. Also the same page mentions "overhead of invokeMethod". It would be nic e to have a better explanation in the section of invokeMethod itself. I am not knowledgeable about this so cannot suggest what can be added. But it would be better to have the explanation in the official docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GROOVY-7578) Image present for metaprogramming is incorrect
Aseem Bansal created GROOVY-7578: Summary: Image present for metaprogramming is incorrect Key: GROOVY-7578 URL: https://issues.apache.org/jira/browse/GROOVY-7578 Project: Groovy Issue Type: Documentation Reporter: Aseem Bansal I am reading groovy metaprogramming documentation and saw that the image present is wrong. I mean there is an image present but the flow is wrong. It has a block "Method exists in MetaClass or Class" two times. One after GroovyInterceptable and other after the first "Method exists in MetaClass or Class" . I tried a simple program and as per this I believe the first block should have class only and second block should have MetaClass only -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GROOVY-7577) Groovy class docs all have ASF license as their explanation
Aseem Bansal created GROOVY-7577: Summary: Groovy class docs all have ASF license as their explanation Key: GROOVY-7577 URL: https://issues.apache.org/jira/browse/GROOVY-7577 Project: Groovy Issue Type: Documentation Affects Versions: 2.4.4 Reporter: Aseem Bansal Priority: Critical I opened one of groovy's API docs and noticed a weird thing. The only explanation that the class had is ASF license. e.g. http://docs.groovy-lang.org/2.4.4/html/gapi/index.html?groovy/lang/GroovyObjectSupport.html I checked the source at https://github.com/apache/incubator-groovy/blob/master/src/main/groovy/lang/GroovyObjectSupport.java and found that there is some actual explanation but that is not present in the API docs. So basically somehow the complete groovy API docs are missing a lot of information. I opened http://docs.groovy-lang.org/2.4.4/html/gapi/groovy/lang/package-summary.html and saw that the description column contains the ASF license. Certainly nobody would expect that in an API documentation. Now I am sure that this is not required by Apache software Foundation as I cross-checked in other Apache projects and they had some actual explanation. It would be great if this is taken care of because in my opinion this just makes the API docs very bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-2425) Migrate website from SVN to Git
[ https://issues.apache.org/jira/browse/KAFKA-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703327#comment-14703327 ] Aseem Bansal edited comment on KAFKA-2425 at 8/19/15 4:47 PM: -- Sorry [~omkreddy] but not getting the time currently. Been busy for past some days. I checked INFRA team ticket and it says that only asf-site branch is supported. I understand that it is definitely a bummer. Just thinking whether it would be possible to use travis or something else to auto cherry pick from trunk/master to this branch? Then the commits can be done to master and let the script do the cherry picks. Don't know how to do it but will look if it is possible. Something like http://lea.verou.me/2011/10/easily-keep-gh-pages-in-sync-with-master/ was (Author: anshbansal): Sorry [~omkreddy] but not getting the time currently. Been busy for past some days. I checked INFRA team ticket and it says that only asf-site branch is supported. I understand that it is definitely a bummer. Just thinking whether it would be possible to use travis or something else to auto cherry pick from trunk/master to this branch? Then the commits can be done to master and let the script do the cherry picks. Don't know how to do it but will look if it is possible. > Migrate website from SVN to Git > > > Key: KAFKA-2425 > URL: https://issues.apache.org/jira/browse/KAFKA-2425 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma >Assignee: Manikumar Reddy > > The preference is to share the same Git repo for the code and website as per > discussion in the mailing list: > http://search-hadoop.com/m/uyzND1Dux842dm7vg2 > Useful reference: > https://blogs.apache.org/infra/entry/git_based_websites_available -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2425) Migrate website from SVN to Git
[ https://issues.apache.org/jira/browse/KAFKA-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703327#comment-14703327 ] Aseem Bansal commented on KAFKA-2425: - Sorry [~omkreddy] but not getting the time currently. Been busy for past some days. I checked INFRA team ticket and it says that only asf-site branch is supported. I understand that it is definitely a bummer. Just thinking whether it would be possible to use travis or something else to auto cherry pick from trunk/master to this branch? Then the commits can be done to master and let the script do the cherry picks. Don't know how to do it but will look if it is possible. > Migrate website from SVN to Git > > > Key: KAFKA-2425 > URL: https://issues.apache.org/jira/browse/KAFKA-2425 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma >Assignee: Manikumar Reddy > > The preference is to share the same Git repo for the code and website as per > discussion in the mailing list: > http://search-hadoop.com/m/uyzND1Dux842dm7vg2 > Useful reference: > https://blogs.apache.org/infra/entry/git_based_websites_available -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-2425) Migrate website from SVN to Git
[ https://issues.apache.org/jira/browse/KAFKA-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693505#comment-14693505 ] Aseem Bansal edited comment on KAFKA-2425 at 8/12/15 1:54 PM: -- The Infra ticket has fields "Git Notification Mailing List" and "Git Repository Import Path". I am not sure what they are. Project: Infrastructure Issue Type: SVN->GIT Migration was (Author: anshbansal): The Infra ticket has fields "Git Notification Mailing List" and "Git Repository Import Path". I am not sure what they are. > Migrate website from SVN to Git > > > Key: KAFKA-2425 > URL: https://issues.apache.org/jira/browse/KAFKA-2425 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma > > The preference is to share the same Git repo for the code and website as per > discussion in the mailing list: > http://search-hadoop.com/m/uyzND1Dux842dm7vg2 > Useful reference: > https://blogs.apache.org/infra/entry/git_based_websites_available -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2425) Migrate website from SVN to Git
[ https://issues.apache.org/jira/browse/KAFKA-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693505#comment-14693505 ] Aseem Bansal commented on KAFKA-2425: - The Infra ticket has fields "Git Notification Mailing List" and "Git Repository Import Path". I am not sure what they are. > Migrate website from SVN to Git > > > Key: KAFKA-2425 > URL: https://issues.apache.org/jira/browse/KAFKA-2425 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma > > The preference is to share the same Git repo for the code and website as per > discussion in the mailing list: > http://search-hadoop.com/m/uyzND1Dux842dm7vg2 > Useful reference: > https://blogs.apache.org/infra/entry/git_based_websites_available -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2425) Migrate website from SVN to Git
[ https://issues.apache.org/jira/browse/KAFKA-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693316#comment-14693316 ] Aseem Bansal commented on KAFKA-2425: - Yes I am interested. But how to do that. I mean I can take the checkout of kafka code from https://github.com/apache/kafka. Where can I get the SVN code? Also anything specific to take care of? > Migrate website from SVN to Git > > > Key: KAFKA-2425 > URL: https://issues.apache.org/jira/browse/KAFKA-2425 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma > > The preference is to share the same Git repo for the code and website as per > discussion in the mailing list: > http://search-hadoop.com/m/uyzND1Dux842dm7vg2 > Useful reference: > https://blogs.apache.org/infra/entry/git_based_websites_available -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GROOVY-7544) Nearly Duplicate sections in documentation
[ https://issues.apache.org/jira/browse/GROOVY-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated GROOVY-7544: - Description: I was reading the documentation when I noticed that these two sections * http://www.groovy-lang.org/download.html * http://www.groovy-lang.org/install.html are nearly duplicate. It seems that due to duplicatacy both of them have some information which the other one does not have. It would be better to merge them into single section. I would suggest to just keep the download section as it is better looking. Merge the extra information from install section and then delete the install section. was: I was reading the documentation when I noticed that these two sections * http://www.groovy-lang.org/download.html * http://www.groovy-lang.org/install.html are nearly duplicate. It seems that due to duplicatacy both of them have some information which the other ne does not have. It would be better to merge them into single section. > Nearly Duplicate sections in documentation > -- > > Key: GROOVY-7544 > URL: https://issues.apache.org/jira/browse/GROOVY-7544 > Project: Groovy > Issue Type: Improvement > Components: Documentation >Affects Versions: 2.4.4 >Reporter: Aseem Bansal > > I was reading the documentation when I noticed that these two sections > * http://www.groovy-lang.org/download.html > * http://www.groovy-lang.org/install.html > are nearly duplicate. It seems that due to duplicatacy both of them have some > information which the other one does not have. It would be better to merge > them into single section. > I would suggest to just keep the download section as it is better looking. > Merge the extra information from install section and then delete the install > section. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GROOVY-7544) Nearly Duplicate sections in documentation
Aseem Bansal created GROOVY-7544: Summary: Nearly Duplicate sections in documentation Key: GROOVY-7544 URL: https://issues.apache.org/jira/browse/GROOVY-7544 Project: Groovy Issue Type: Improvement Components: Documentation Affects Versions: 2.4.4 Reporter: Aseem Bansal I was reading the documentation when I noticed that these two sections * http://www.groovy-lang.org/download.html * http://www.groovy-lang.org/install.html are nearly duplicate. It seems that due to duplicatacy both of them have some information which the other ne does not have. It would be better to merge them into single section. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GROOVY-7543) Suggestion for Download page
[ https://issues.apache.org/jira/browse/GROOVY-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated GROOVY-7543: - Description: On the groovy website download page http://www.groovy-lang.org/download.html there is a "System requirements" at the bottom. It says "JVM Required". Is it minimum/maximum/only version of JVM supported? If it is based on some automated build to test compatibility it would be good to link that. was: On the groovy website download page http://www.groovy-lang.org/download.html there is a "System requirements" at the bottom. It says "JVM Required". Is it minimum/maximum/only ? If it is based on some automated build to test compatibility it would be good to link that. > Suggestion for Download page > > > Key: GROOVY-7543 > URL: https://issues.apache.org/jira/browse/GROOVY-7543 > Project: Groovy > Issue Type: Improvement > Components: Documentation >Affects Versions: 2.4.4 > Environment: Website >Reporter: Aseem Bansal >Priority: Trivial > > On the groovy website download page http://www.groovy-lang.org/download.html > there is a "System requirements" at the bottom. > It says "JVM Required". Is it minimum/maximum/only version of JVM supported? > If it is based on some automated build to test compatibility it would be good > to link that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SPARK-9678) HTTP request to BlockManager port yields exception
[ https://issues.apache.org/jira/browse/SPARK-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662321#comment-14662321 ] Aseem Bansal commented on SPARK-9678: - I understand. Just thought to mention that. > HTTP request to BlockManager port yields exception > -- > > Key: SPARK-9678 > URL: https://issues.apache.org/jira/browse/SPARK-9678 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 1.4.1 > Environment: Ubuntu 14.0.4 >Reporter: Aseem Bansal >Priority: Minor > > I was going through the quick start for spark 1.4.1 at > http://spark.apache.org/docs/latest/quick-start.html. I am using pySpark. > Also the exact version that I am using is spark-1.4.1-bin-hadoop2.4 > The quick start has textFile = sc.textFile("README.md"). I ran that and then > the following text appeared in the command line > {noformat} > 15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(143840) called with > curMem=0, maxMem=278302556 > 15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0 stored as values in > memory (estimated size 140.5 KB, free 265.3 MB) > 15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(12633) called with > curMem=143840, maxMem=278302556 > 15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes > in memory (estimated size 12.3 KB, free 265.3 MB) > 15/08/06 10:37:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory > on localhost:53311 (size: 12.3 KB, free: 265.4 MB) > 15/08/06 10:37:03 INFO SparkContext: Created broadcast 0 from textFile at > NativeMethodAccessorImpl.java:-2 > {noformat} > I saw that there was an IP in these logs i.e. localhost:53311 > I tried connecting to it via Google Chrome and got an exception. > {noformat} > >>> 15/08/06 10:37:30 WARN TransportChannelHandler: Exception in connection > >>> from /127.0.0.1:54056 > io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds > 2147483647: 5135603447292250196 - discarded > at > io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:501) > at > io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:477) > at > io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:403) > at > io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:343) > at > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:149) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (GROOVY-7543) Suggestion for Download page
Aseem Bansal created GROOVY-7543: Summary: Suggestion for Download page Key: GROOVY-7543 URL: https://issues.apache.org/jira/browse/GROOVY-7543 Project: Groovy Issue Type: Improvement Components: Documentation Affects Versions: 2.4.4 Environment: Website Reporter: Aseem Bansal Priority: Trivial On the groovy website download page http://www.groovy-lang.org/download.html there is a "System requirements" at the bottom. It says "JVM Required". Is it minimum/maximum/only ? If it is based on some automated build to test compatibility it would be good to link that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2364) Improve documentation for contributing to docs
[ https://issues.apache.org/jira/browse/KAFKA-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659917#comment-14659917 ] Aseem Bansal commented on KAFKA-2364: - How do you reply to and get those emails? > Improve documentation for contributing to docs > -- > > Key: KAFKA-2364 > URL: https://issues.apache.org/jira/browse/KAFKA-2364 > Project: Kafka > Issue Type: Task >Reporter: Aseem Bansal >Priority: Minor > Labels: doc > > While reading the documentation for kafka 8 I saw some improvements that can > be made. But the docs for contributing are not very good at > https://github.com/apache/kafka. It just gives me a URL for svn. But I am not > sure what to do. Can the README.MD file be improved for contributing to docs? > I have submitted patches to groovy and grails by sending PRs via github but > looking at the comments on PRs submitted to kafak it seems PRs via github are > not working for kafka. It would be good to make that work also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SPARK-9678) Exception while going through quick start
[ https://issues.apache.org/jira/browse/SPARK-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated SPARK-9678: Description: I was going through the quick start for spark 1.4.1 at http://spark.apache.org/docs/latest/quick-start.html. I am using pySpark. Also the exact version that I am using is spark-1.4.1-bin-hadoop2.4 The quick start has textFile = sc.textFile("README.md"). I ran that and then the following text appeared in the command line {noformat} 15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(143840) called with curMem=0, maxMem=278302556 15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 140.5 KB, free 265.3 MB) 15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(12633) called with curMem=143840, maxMem=278302556 15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.3 KB, free 265.3 MB) 15/08/06 10:37:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:53311 (size: 12.3 KB, free: 265.4 MB) 15/08/06 10:37:03 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2 {noformat} I saw that there was an IP in these logs i.e. localhost:53311 I tried connecting to it via Google Chrome and got an exception. {noformat} >>> 15/08/06 10:37:30 WARN TransportChannelHandler: Exception in connection >>> from /127.0.0.1:54056 io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 2147483647: 5135603447292250196 - discarded at io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:501) at io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:477) at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:403) at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:343) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:149) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:745) {noformat} was: I was going through the quick start for spark 1.4.1 at http://spark.apache.org/docs/latest/quick-start.html. I am using pySpark. Also the exact version that I am using is spark-1.4.1-bin-hadoop2.4 The quick start has textFile = sc.textFile("README.md"). I ran that and then the following text appeared in the command line 15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(143840) called with curMem=0, maxMem=278302556 15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 140.5 KB, free 265.3 MB) 15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(12633) called with curMem=143840, maxMem=278302556 15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.3 KB, free 265.3 MB) 15/08/06 10:37:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:53311 (size: 12.3 KB, free: 265.4 MB) 15/08/06 10:37:03 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2 I saw that there was an IP in these logs i.e. localhost:53311 I tried connecting to it via Google Chrome and got an exception. >>> 15/08/06 10:37:30 WARN TransportChannelHandler: Exception in connection >>> from /127.0.0.1:54056 io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 2147483647: 5135603447292250196 - discarded at io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:501) at io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:477) at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:403) at io.netty.handler.codec.LengthFieldBasedFrameDecoder.de
[jira] [Updated] (SPARK-9678) Exception while going through quick start
[ https://issues.apache.org/jira/browse/SPARK-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated SPARK-9678: Description: I was going through the quick start for spark 1.4.1 at http://spark.apache.org/docs/latest/quick-start.html. I am using pySpark. Also the exact version that I am using is spark-1.4.1-bin-hadoop2.4 The quick start has textFile = sc.textFile("README.md"). I ran that and then the following text appeared in the command line 15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(143840) called with curMem=0, maxMem=278302556 15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 140.5 KB, free 265.3 MB) 15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(12633) called with curMem=143840, maxMem=278302556 15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.3 KB, free 265.3 MB) 15/08/06 10:37:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:53311 (size: 12.3 KB, free: 265.4 MB) 15/08/06 10:37:03 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2 I saw that there was an IP in these logs i.e. localhost:53311 I tried connecting to it via Google Chrome and got an exception. >>> 15/08/06 10:37:30 WARN TransportChannelHandler: Exception in connection >>> from /127.0.0.1:54056 io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 2147483647: 5135603447292250196 - discarded at io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:501) at io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:477) at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:403) at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:343) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:149) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:745) was: I was going through the quick start for spark 1.4.1 at http://spark.apache.org/docs/latest/quick-start.html. I am using pySpark The quick start has textFile = sc.textFile("README.md"). I ran that and then the following text appeared in the command line 15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(143840) called with curMem=0, maxMem=278302556 15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 140.5 KB, free 265.3 MB) 15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(12633) called with curMem=143840, maxMem=278302556 15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.3 KB, free 265.3 MB) 15/08/06 10:37:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:53311 (size: 12.3 KB, free: 265.4 MB) 15/08/06 10:37:03 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2 I saw that there was an IP in these logs i.e. localhost:53311 I tried connecting to it via Google Chrome and got an exception. >>> 15/08/06 10:37:30 WARN TransportChannelHandler: Exception in connection >>> from /127.0.0.1:54056 io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 2147483647: 5135603447292250196 - discarded at io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:501) at io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:477) at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:403) at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:343) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(
[jira] [Created] (SPARK-9678) Exception while going through quick start
Aseem Bansal created SPARK-9678: --- Summary: Exception while going through quick start Key: SPARK-9678 URL: https://issues.apache.org/jira/browse/SPARK-9678 Project: Spark Issue Type: Bug Affects Versions: 1.4.1 Environment: Ubuntu 14.0.4 Reporter: Aseem Bansal I was going through the quick start for spark 1.4.1 at http://spark.apache.org/docs/latest/quick-start.html. I am using pySpark The quick start has textFile = sc.textFile("README.md"). I ran that and then the following text appeared in the command line 15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(143840) called with curMem=0, maxMem=278302556 15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 140.5 KB, free 265.3 MB) 15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(12633) called with curMem=143840, maxMem=278302556 15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.3 KB, free 265.3 MB) 15/08/06 10:37:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:53311 (size: 12.3 KB, free: 265.4 MB) 15/08/06 10:37:03 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2 I saw that there was an IP in these logs i.e. localhost:53311 I tried connecting to it via Google Chrome and got an exception. >>> 15/08/06 10:37:30 WARN TransportChannelHandler: Exception in connection >>> from /127.0.0.1:54056 io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 2147483647: 5135603447292250196 - discarded at io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:501) at io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:477) at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:403) at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:343) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:149) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (KAFKA-2364) Improve documentation for contributing to docs
[ https://issues.apache.org/jira/browse/KAFKA-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659500#comment-14659500 ] Aseem Bansal commented on KAFKA-2364: - I did this but didn't get a reply. Are the replies shown somewhere? > Improve documentation for contributing to docs > -- > > Key: KAFKA-2364 > URL: https://issues.apache.org/jira/browse/KAFKA-2364 > Project: Kafka > Issue Type: Task >Reporter: Aseem Bansal >Priority: Minor > Labels: doc > > While reading the documentation for kafka 8 I saw some improvements that can > be made. But the docs for contributing are not very good at > https://github.com/apache/kafka. It just gives me a URL for svn. But I am not > sure what to do. Can the README.MD file be improved for contributing to docs? > I have submitted patches to groovy and grails by sending PRs via github but > looking at the comments on PRs submitted to kafak it seems PRs via github are > not working for kafka. It would be good to make that work also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2364) Improve documentation for contributing to docs
[ https://issues.apache.org/jira/browse/KAFKA-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649447#comment-14649447 ] Aseem Bansal commented on KAFKA-2364: - You mean dev@kafka.apache.org? I found that on http://kafka.apache.org/contributing.html. Or should I start a discussion on https://groups.google.com/forum/#!forum/kafka-dev? I know you said email but I find forums easier. > Improve documentation for contributing to docs > -- > > Key: KAFKA-2364 > URL: https://issues.apache.org/jira/browse/KAFKA-2364 > Project: Kafka > Issue Type: Task >Reporter: Aseem Bansal >Priority: Minor > Labels: doc > > While reading the documentation for kafka 8 I saw some improvements that can > be made. But the docs for contributing are not very good at > https://github.com/apache/kafka. It just gives me a URL for svn. But I am not > sure what to do. Can the README.MD file be improved for contributing to docs? > I have submitted patches to groovy and grails by sending PRs via github but > looking at the comments on PRs submitted to kafak it seems PRs via github are > not working for kafka. It would be good to make that work also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2364) Improve documentation for contributing to docs
[ https://issues.apache.org/jira/browse/KAFKA-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649430#comment-14649430 ] Aseem Bansal commented on KAFKA-2364: - It says "Create a patch that applies cleanly against SVN trunk.". I understand what that means but isn't this process a bit too complex? While submitting patches to groovy/grails it was very easy. If this is due to not having a git mirror then let me know how I can help. I read https://blogs.apache.org/infra/entry/git_based_websites_available but I am not sure how I can help there. As per this it needs a ticket with apache infra team. Do you mean the migration from SVN to git? If yes, let me know. > Improve documentation for contributing to docs > -- > > Key: KAFKA-2364 > URL: https://issues.apache.org/jira/browse/KAFKA-2364 > Project: Kafka > Issue Type: Task >Reporter: Aseem Bansal >Priority: Minor > Labels: doc > > While reading the documentation for kafka 8 I saw some improvements that can > be made. But the docs for contributing are not very good at > https://github.com/apache/kafka. It just gives me a URL for svn. But I am not > sure what to do. Can the README.MD file be improved for contributing to docs? > I have submitted patches to groovy and grails by sending PRs via github but > looking at the comments on PRs submitted to kafak it seems PRs via github are > not working for kafka. It would be good to make that work also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2364) Improve documentation for contributing to docs
[ https://issues.apache.org/jira/browse/KAFKA-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated KAFKA-2364: Priority: Minor (was: Major) > Improve documentation for contributing to docs > -- > > Key: KAFKA-2364 > URL: https://issues.apache.org/jira/browse/KAFKA-2364 > Project: Kafka > Issue Type: Task >Reporter: Aseem Bansal >Priority: Minor > Labels: doc > > While reading the documentation for kafka 8 I saw some improvements that can > be made. But the docs for contributing are not very good at > https://github.com/apache/kafka. It just gives me a URL for svn. But I am not > sure what to do. Can the README.MD file be improved for contributing to docs? > I have submitted patches to groovy and grails by sending PRs via github but > looking at the comments on PRs submitted to kafak it seems PRs via github are > not working for kafka. It would be good to make that work also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2364) Improve documentation for contributing to docs
[ https://issues.apache.org/jira/browse/KAFKA-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aseem Bansal updated KAFKA-2364: Description: While reading the documentation for kafka 8 I saw some improvements that can be made. But the docs for contributing are not very good at https://github.com/apache/kafka. It just gives me a URL for svn. But I am not sure what to do. Can the README.MD file be improved for contributing to docs? I have submitted patches to groovy and grails by sending PRs via github but looking at the comments on PRs submitted to kafak it seems PRs via github are not working for kafka. It would be good to make that work also. was: While reading the documentation for kafka 8 I saw some improvements that can be made. But the docs for contributing are not very good at https://github.com/apache/kafka. It just gives me a URL for svn. But I am not sure what to do. Can the README.MD file be improved for contributing to docs? I have submitted patches to groovy and grails by sending PRs via github but looking at the comments it seems PRs via github are not working for kafka. It would be good to make that work also. > Improve documentation for contributing to docs > -- > > Key: KAFKA-2364 > URL: https://issues.apache.org/jira/browse/KAFKA-2364 > Project: Kafka > Issue Type: Task >Reporter: Aseem Bansal > Labels: doc > > While reading the documentation for kafka 8 I saw some improvements that can > be made. But the docs for contributing are not very good at > https://github.com/apache/kafka. It just gives me a URL for svn. But I am not > sure what to do. Can the README.MD file be improved for contributing to docs? > I have submitted patches to groovy and grails by sending PRs via github but > looking at the comments on PRs submitted to kafak it seems PRs via github are > not working for kafka. It would be good to make that work also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)