[GitHub] uzadude opened a new pull request #5: [DATAFU-148] Spark Support

2019-02-17 Thread GitBox
uzadude opened a new pull request #5: [DATAFU-148] Spark Support URL: https://github.com/apache/datafu/pull/5 Creating this PR to consolidate all the realted changes to first version that supports Spark. [DATAFU-148](https://issues.apache.org/jira/browse/DATAFU-148)

[GitHub] uzadude commented on issue #5: [DATAFU-148] Spark Support

2019-02-17 Thread GitBox
uzadude commented on issue #5: [DATAFU-148] Spark Support URL: https://github.com/apache/datafu/pull/5#issuecomment-464613149 re-opening after adding ScalaPythonBridge functionallity This is an automated message from the Apach

[GitHub] [datafu] rjurney opened a new pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-04-19 Thread GitBox
rjurney opened a new pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15 Need the pull request to review/discuss This is an automated message from the Apache Git Ser

[GitHub] [datafu] eyala commented on issue #5: [DATAFU-148] Spark Support

2019-04-21 Thread GitBox
eyala commented on issue #5: [DATAFU-148] Spark Support URL: https://github.com/apache/datafu/pull/5#issuecomment-485240762 Merged the commit from Feb 18th This is an automated message from the Apache Git Service. To respond t

[GitHub] [datafu] eyala commented on issue #15: Add Spark functionality to DataFu, datafu-spark

2019-05-02 Thread GitBox
eyala commented on issue #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#issuecomment-488644759 I didn't take care of all of Russell's comments, but I had to make some changes in the tests in order for them to pass in all the Spark/Scala ve

[GitHub] [datafu] uzadude closed pull request #5: [DATAFU-148] Spark Support

2019-05-14 Thread GitBox
uzadude closed pull request #5: [DATAFU-148] Spark Support URL: https://github.com/apache/datafu/pull/5 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [datafu] uzadude commented on issue #15: Add Spark functionality to DataFu, datafu-spark

2019-05-15 Thread GitBox
uzadude commented on issue #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#issuecomment-492920942 @rjurney, I've reformatted all the code with Spark's scalastyle.xml. I believe this should solve most of the code style issues. -

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286187133 ## File path: datafu-spark/README.md ## @@ -0,0 +1,71 @@ +# datafu-spark + +datafu-spark contains a numb

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286242193 ## File path: datafu-spark/src/main/scala/spark/utils/overwrites/SparkPythonRunner.scala ## @@ -0,0 +1,

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286238319 ## File path: datafu-spark/src/test/resources/python_tests/pyfromscala.py ## @@ -0,0 +1,92 @@ +# License

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286237816 ## File path: datafu-spark/src/test/resources/python_tests/pyfromscala_with_error.py ## @@ -0,0 +1,18 @

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286243461 ## File path: datafu-spark/build.gradle ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foun

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286236660 ## File path: datafu-spark/src/test/resources/META-INF/services/datafu.spark.PythonResource ## @@ -0,0

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286192534 ## File path: datafu-spark/build_and_test_spark.sh ## @@ -0,0 +1,115 @@ +# Licensed to the Apache Softwa

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286240661 ## File path: datafu-spark/src/main/scala/datafu/spark/ScalaPythonBridge.scala ## @@ -0,0 +1,166 @@ +/*

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286187579 ## File path: datafu-spark/README.md ## @@ -0,0 +1,71 @@ +# datafu-spark + +datafu-spark contains a numb

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286237649 ## File path: datafu-spark/src/test/resources/python_tests/df_utils_tests.py ## @@ -0,0 +1,88 @@ +# Lice

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286197035 ## File path: datafu-spark/src/main/resources/META-INF/LICENSE ## @@ -0,0 +1,393 @@ +

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286197225 ## File path: datafu-spark/src/main/resources/META-INF/NOTICE ## @@ -0,0 +1,60 @@ +Apache DataFu Revi

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286243214 ## File path: datafu-spark/README.md ## @@ -0,0 +1,71 @@ +# datafu-spark + +datafu-spark contains a numb

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286188718 ## File path: datafu-spark/build.gradle ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foun

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286244813 ## File path: datafu-spark/src/main/scala/datafu/spark/DataFrameOps.scala ## @@ -0,0 +1,92 @@ +/* + * Li

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286233883 ## File path: datafu-spark/src/main/resources/pyspark_utils/df_utils.py ## @@ -0,0 +1,161 @@ +# Licensed

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-21 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r286234807 ## File path: datafu-spark/src/main/resources/pyspark_utils/init_spark_context.py ## @@ -0,0 +1,21 @@ +#

[GitHub] [datafu] rjurney commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-27 Thread GitBox
rjurney commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r287922880 ## File path: datafu-spark/build_and_test_spark.sh ## @@ -0,0 +1,115 @@ +# Licensed to the Apache Software

[GitHub] [datafu] rjurney commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-27 Thread GitBox
rjurney commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r287923002 ## File path: datafu-spark/README.md ## @@ -0,0 +1,71 @@ +# datafu-spark + +datafu-spark contains a number

[GitHub] [datafu] rjurney commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-27 Thread GitBox
rjurney commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r287923515 ## File path: datafu-spark/src/main/scala/datafu/spark/DataFrameOps.scala ## @@ -0,0 +1,92 @@ +/* + * Lice

[GitHub] [datafu] eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-30 Thread GitBox
eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r288963450 ## File path: datafu-spark/README.md ## @@ -0,0 +1,71 @@ +# datafu-spark + +datafu-spark contains a number o

[GitHub] [datafu] eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-30 Thread GitBox
eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r288965080 ## File path: datafu-spark/src/test/resources/META-INF/services/datafu.spark.PythonResource ## @@ -0,0 +1,2

[GitHub] [datafu] eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-30 Thread GitBox
eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r289000541 ## File path: datafu-spark/build.gradle ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [datafu] eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-30 Thread GitBox
eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r289000407 ## File path: datafu-spark/README.md ## @@ -0,0 +1,71 @@ +# datafu-spark + +datafu-spark contains a number o

[GitHub] [datafu] eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-30 Thread GitBox
eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r289000478 ## File path: datafu-spark/README.md ## @@ -0,0 +1,71 @@ +# datafu-spark + +datafu-spark contains a number o

[GitHub] [datafu] eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-30 Thread GitBox
eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r289003198 ## File path: datafu-spark/build_and_test_spark.sh ## @@ -0,0 +1,115 @@ +# Licensed to the Apache Software F

[GitHub] [datafu] eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-30 Thread GitBox
eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r289003243 ## File path: datafu-spark/src/main/resources/META-INF/LICENSE ## @@ -0,0 +1,393 @@ +

[GitHub] [datafu] eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-30 Thread GitBox
eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r289003317 ## File path: datafu-spark/src/main/resources/META-INF/NOTICE ## @@ -0,0 +1,60 @@ +Apache DataFu Review c

[GitHub] [datafu] eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-30 Thread GitBox
eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r289003417 ## File path: datafu-spark/src/test/resources/python_tests/pyfromscala_with_error.py ## @@ -0,0 +1,18 @@ +#

[GitHub] [datafu] eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-30 Thread GitBox
eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r289003367 ## File path: datafu-spark/src/test/resources/python_tests/df_utils_tests.py ## @@ -0,0 +1,88 @@ +# Licensed

[GitHub] [datafu] eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-05-30 Thread GitBox
eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r289003455 ## File path: datafu-spark/src/test/resources/python_tests/pyfromscala.py ## @@ -0,0 +1,92 @@ +# Licensed to

[GitHub] [datafu] uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-06-03 Thread GitBox
uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r289847779 ## File path: datafu-spark/src/main/resources/pyspark_utils/df_utils.py ## @@ -0,0 +1,161 @@ +# Licensed t

[GitHub] [datafu] uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-06-03 Thread GitBox
uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r289848210 ## File path: datafu-spark/src/main/resources/pyspark_utils/init_spark_context.py ## @@ -0,0 +1,21 @@ +# L

[GitHub] [datafu] uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-06-03 Thread GitBox
uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r289851400 ## File path: datafu-spark/src/main/scala/datafu/spark/ScalaPythonBridge.scala ## @@ -0,0 +1,166 @@ +/* +

[GitHub] [datafu] uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-06-03 Thread GitBox
uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r289857613 ## File path: datafu-spark/src/main/scala/spark/utils/overwrites/SparkPythonRunner.scala ## @@ -0,0 +1,13

[GitHub] [datafu] uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-06-03 Thread GitBox
uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r289864950 ## File path: datafu-spark/src/main/scala/datafu/spark/DataFrameOps.scala ## @@ -0,0 +1,92 @@ +/* + * Lice

[GitHub] [datafu] eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-06-04 Thread GitBox
eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r290209031 ## File path: datafu-spark/build.gradle ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [datafu] eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-06-11 Thread GitBox
eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r292371000 ## File path: datafu-spark/src/test/resources/META-INF/services/datafu.spark.PythonResource ## @@ -0,0 +1,2

[GitHub] [datafu] uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-06-13 Thread GitBox
uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r293340319 ## File path: datafu-spark/build_and_test_spark.sh ## @@ -0,0 +1,115 @@ +# Licensed to the Apache Software

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-07-09 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r301844121 ## File path: datafu-spark/src/main/resources/pyspark_utils/init_spark_context.py ## @@ -0,0 +1,21 @@ +#

[GitHub] [datafu] matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-07-09 Thread GitBox
matthayes commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r301844723 ## File path: datafu-spark/src/test/resources/META-INF/services/datafu.spark.PythonResource ## @@ -0,0

[GitHub] [datafu] matthayes commented on issue #15: Add Spark functionality to DataFu, datafu-spark

2019-07-09 Thread GitBox
matthayes commented on issue #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#issuecomment-509865726 +1 I reviewed the recent code changes and these look good to me. I am able to build the JAR via `assemble`. However I did ru

[GitHub] [datafu] matthayes commented on issue #15: Add Spark functionality to DataFu, datafu-spark

2019-07-09 Thread GitBox
matthayes commented on issue #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#issuecomment-509866615 Okay I think I get what's going on. SparkPythonRunner must be assuming that Python 2.x is being used, however I am using Python 3.6. When

[GitHub] [datafu] matthayes commented on issue #15: Add Spark functionality to DataFu, datafu-spark

2019-07-09 Thread GitBox
matthayes commented on issue #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#issuecomment-509866739 Anyways from my perspective I think we're good to merge in. @rjurney any other comments?

[GitHub] [datafu] eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-07-10 Thread GitBox
eyala commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r302037973 ## File path: datafu-spark/src/test/resources/META-INF/services/datafu.spark.PythonResource ## @@ -0,0 +1,2

[GitHub] [datafu] rjurney commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-07-14 Thread GitBox
rjurney commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r303248340 ## File path: datafu-spark/src/main/resources/pyspark_utils/bridge_utils.py ## @@ -0,0 +1,72 @@ +# License

[GitHub] [datafu] rjurney commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-07-14 Thread GitBox
rjurney commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r303248625 ## File path: datafu-spark/src/main/resources/pyspark_utils/df_utils.py ## @@ -0,0 +1,171 @@ +# Licensed t

[GitHub] [datafu] rjurney commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-07-14 Thread GitBox
rjurney commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r303248700 ## File path: datafu-spark/src/main/resources/pyspark_utils/df_utils.py ## @@ -0,0 +1,171 @@ +# Licensed t

[GitHub] [datafu] rjurney commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-07-14 Thread GitBox
rjurney commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r303248700 ## File path: datafu-spark/src/main/resources/pyspark_utils/df_utils.py ## @@ -0,0 +1,171 @@ +# Licensed t

[GitHub] [datafu] uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-07-16 Thread GitBox
uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r303910701 ## File path: datafu-spark/src/main/resources/pyspark_utils/bridge_utils.py ## @@ -0,0 +1,72 @@ +# License

[GitHub] [datafu] uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-07-16 Thread GitBox
uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r303911042 ## File path: datafu-spark/src/main/resources/pyspark_utils/df_utils.py ## @@ -0,0 +1,171 @@ +# Licensed t

[GitHub] [datafu] uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-07-16 Thread GitBox
uzadude commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r303911500 ## File path: datafu-spark/src/main/resources/pyspark_utils/df_utils.py ## @@ -0,0 +1,171 @@ +# Licensed t

[GitHub] [datafu] rjurney commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark

2019-07-16 Thread GitBox
rjurney commented on a change in pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#discussion_r304139883 ## File path: datafu-spark/src/main/resources/pyspark_utils/df_utils.py ## @@ -0,0 +1,171 @@ +# Licensed t

[GitHub] [datafu] eyala commented on issue #15: Add Spark functionality to DataFu, datafu-spark

2019-07-17 Thread GitBox
eyala commented on issue #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#issuecomment-512185721 Merged This is an automated message from the Apache Git Service. To respond to

[GitHub] [datafu] matthayes commented on issue #15: Add Spark functionality to DataFu, datafu-spark

2019-07-17 Thread GitBox
matthayes commented on issue #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15#issuecomment-512481089 Russell can you close this now since it is merged already? This is an auto

[GitHub] [datafu] uzadude opened a new pull request #16: [DATAFU-153] Add support for Python 3

2020-03-27 Thread GitBox
uzadude opened a new pull request #16: [DATAFU-153] Add support for Python 3 URL: https://github.com/apache/datafu/pull/16 ## Summary with minor code changes, we can easily support also python 3. This is an automated messag

[GitHub] [datafu] matthayes commented on issue #16: [DATAFU-153] Add support for Python 3

2020-03-31 Thread GitBox
matthayes commented on issue #16: [DATAFU-153] Add support for Python 3 URL: https://github.com/apache/datafu/pull/16#issuecomment-606787541 Already merged separately. This is an automated message from the Apache Git Service.

[GitHub] [datafu] matthayes closed pull request #15: Add Spark functionality to DataFu, datafu-spark

2020-03-31 Thread GitBox
matthayes closed pull request #15: Add Spark functionality to DataFu, datafu-spark URL: https://github.com/apache/datafu/pull/15 This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [datafu] matthayes closed pull request #16: [DATAFU-153] Add support for Python 3

2020-03-31 Thread GitBox
matthayes closed pull request #16: [DATAFU-153] Add support for Python 3 URL: https://github.com/apache/datafu/pull/16 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [datafu] matthayes closed pull request #1: AhoCorasickMatch UDF with unit tests

2020-03-31 Thread GitBox
matthayes closed pull request #1: AhoCorasickMatch UDF with unit tests URL: https://github.com/apache/datafu/pull/1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [datafu] matthayes commented on issue #2: Added UDF ZipBags which can zip and arbitrary number of bags into one

2020-03-31 Thread GitBox
matthayes commented on issue #2: Added UDF ZipBags which can zip and arbitrary number of bags into one URL: https://github.com/apache/datafu/pull/2#issuecomment-606788461 Already merged This is an automated message from the A

[GitHub] [datafu] matthayes closed pull request #2: Added UDF ZipBags which can zip and arbitrary number of bags into one

2020-03-31 Thread GitBox
matthayes closed pull request #2: Added UDF ZipBags which can zip and arbitrary number of bags into one URL: https://github.com/apache/datafu/pull/2 This is an automated message from the Apache Git Service. To respond to the

[GitHub] [datafu] matthayes commented on issue #3: Enhance InUDF to support tuple version and add java compatibility for datafu-pig

2020-03-31 Thread GitBox
matthayes commented on issue #3: Enhance InUDF to support tuple version and add java compatibility for datafu-pig URL: https://github.com/apache/datafu/pull/3#issuecomment-606788743 Closing this as JIRA was filed and resolved as won't fix. --

[GitHub] [datafu] matthayes closed pull request #3: Enhance InUDF to support tuple version and add java compatibility for datafu-pig

2020-03-31 Thread GitBox
matthayes closed pull request #3: Enhance InUDF to support tuple version and add java compatibility for datafu-pig URL: https://github.com/apache/datafu/pull/3 This is an automated message from the Apache Git Service. To res

[GitHub] [datafu] XinyuLiu5566 opened a new pull request #17: refactor code in PathUtil.java

2021-10-09 Thread GitBox
XinyuLiu5566 opened a new pull request #17: URL: https://github.com/apache/datafu/pull/17 From the CodeGuru report, Similar code fragments were detected in the same file at the following lines: 270:284, 295:309. I refactored the code to remove duplicates. -- This is an automated messa

[GitHub] [datafu] eyala commented on pull request #17: refactor code in PathUtil.java

2021-10-11 Thread GitBox
eyala commented on pull request #17: URL: https://github.com/apache/datafu/pull/17#issuecomment-939903175 Hello! I'm glad to see your contributions. I went over the first commit before you added more (it looks fine), but I see you've made more changes - we might want to split them into the

[GitHub] [datafu] eyala opened a new pull request #18: Fix java compilation in codeql workflow and update some dependencies

2021-10-27 Thread GitBox
eyala opened a new pull request #18: URL: https://github.com/apache/datafu/pull/18 1. Fixes codeql build by upgrading the Gradle version used for the Gradle wrapper 2. Updates libraries used by website 3. Replaces some http urls with https -- This is an automated message from the A

[GitHub] [datafu] eyala merged pull request #18: Fix java compilation in codeql workflow and update some dependencies

2021-11-06 Thread GitBox
eyala merged pull request #18: URL: https://github.com/apache/datafu/pull/18 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@dataf

[GitHub] [datafu] XinyuLiu5566 commented on pull request #17: refactor code in PathUtil.java

2021-11-15 Thread GitBox
XinyuLiu5566 commented on pull request #17: URL: https://github.com/apache/datafu/pull/17#issuecomment-969387640 Hi sorry for the late reply! How do you want me to split the changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [datafu] eyala commented on pull request #17: refactor code in PathUtil.java

2021-11-16 Thread GitBox
eyala commented on pull request #17: URL: https://github.com/apache/datafu/pull/17#issuecomment-970103773 You can divide it by project - maybe leave the datafu-pig commits here, and make a new PR with the datafu-hourglass ones. How did you test this? When I try to build your branch I

[GitHub] [datafu] XinyuLiu5566 commented on pull request #17: refactor code in PathUtil.java

2021-12-08 Thread GitBox
XinyuLiu5566 commented on pull request #17: URL: https://github.com/apache/datafu/pull/17#issuecomment-989480453 After checking the document of lombok.NonNull, I think it's better to leave the code unchanged so I will open another PR without lombok.NonNul. -- This is an automated message

[GitHub] [datafu] eyala commented on pull request #17: refactor code in PathUtil.java

2022-01-09 Thread GitBox
eyala commented on pull request #17: URL: https://github.com/apache/datafu/pull/17#issuecomment-1008300825 You can either open a new PR or just remove the change from this one, either option is fine. But are you successfully running all the tests? If you have changes that aren't exe

[GitHub] [datafu] dependabot[bot] opened a new pull request #19: Bump nokogiri from 1.12.5 to 1.13.2 in /site

2022-02-28 Thread GitBox
dependabot[bot] opened a new pull request #19: URL: https://github.com/apache/datafu/pull/19 Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.12.5 to 1.13.2. Release notes Sourced from https://github.com/sparklemotion/nokogiri/releases";>nokogiri's releases.

[GitHub] [datafu] eyala merged pull request #19: Bump nokogiri from 1.12.5 to 1.13.2 in /site

2022-02-28 Thread GitBox
eyala merged pull request #19: URL: https://github.com/apache/datafu/pull/19 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@dataf

[GitHub] [datafu] dependabot[bot] opened a new pull request, #20: Bump nokogiri from 1.13.2 to 1.13.4 in /site

2022-04-12 Thread GitBox
dependabot[bot] opened a new pull request, #20: URL: https://github.com/apache/datafu/pull/20 Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.13.2 to 1.13.4. Release notes Sourced from https://github.com/sparklemotion/nokogiri/releases";>nokogiri's releases.

[GitHub] [datafu] eyala merged pull request #20: Bump nokogiri from 1.13.2 to 1.13.4 in /site

2022-04-12 Thread GitBox
eyala merged PR #20: URL: https://github.com/apache/datafu/pull/20 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@datafu.apache.org Fo

[GitHub] [datafu] petrpulc opened a new pull request, #21: Keep the unmatched single records in joinWithRange

2022-04-13 Thread GitBox
petrpulc opened a new pull request, #21: URL: https://github.com/apache/datafu/pull/21 The filter at the end in fact causes the join to behave like 'inner' because it filters out the records from singleDf that have no matching range... because range_start and range_end are null in that case

[GitHub] [datafu] eyala commented on pull request #21: Keep the unmatched single records in joinWithRange

2022-04-15 Thread GitBox
eyala commented on PR #21: URL: https://github.com/apache/datafu/pull/21#issuecomment-1099986584 I think you're correct in your analysis - this does make the join basically an inner join. There are two issues that need to be addressed before we can merge this, one theoretical and one practi

[GitHub] [datafu] uzadude commented on pull request #21: Keep the unmatched single records in joinWithRange

2022-04-15 Thread GitBox
uzadude commented on PR #21: URL: https://github.com/apache/datafu/pull/21#issuecomment-113021 sure, let's add a `joinType` parameter like in the skew join methods. let's keep it backward compatible. -- This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [datafu] petrpulc closed pull request #21: Keep the unmatched single records in joinWithRange

2022-04-27 Thread GitBox
petrpulc closed pull request #21: Keep the unmatched single records in joinWithRange URL: https://github.com/apache/datafu/pull/21 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [datafu] petrpulc commented on pull request #21: Keep the unmatched single records in joinWithRange

2022-04-27 Thread GitBox
petrpulc commented on PR #21: URL: https://github.com/apache/datafu/pull/21#issuecomment-52 Hi, I agree with your suggestions, the initial change set was just to spark (pun intended) the discussion and as my braindump if someone would like to take the issue faster than I was able to

[GitHub] [datafu] petrpulc closed pull request #21: Keep the unmatched single records in joinWithRange

2022-04-27 Thread GitBox
petrpulc closed pull request #21: Keep the unmatched single records in joinWithRange URL: https://github.com/apache/datafu/pull/21 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [datafu] petrpulc commented on pull request #21: Keep the unmatched single records in joinWithRange

2022-04-27 Thread GitBox
petrpulc commented on PR #21: URL: https://github.com/apache/datafu/pull/21#issuecomment-101928 Well, during testing I actually found a pretty serious issue... if the record falls into `decreased_range_single`, but `range_start` and `range_end` does not contain `single` then I would nee

[GitHub] [datafu] eyala commented on pull request #21: Keep the unmatched single records in joinWithRange

2022-05-16 Thread GitBox
eyala commented on PR #21: URL: https://github.com/apache/datafu/pull/21#issuecomment-1127673751 I think you're right. I would say that it's still worth doing ... but if there are multiple records with the same "key" (the column provided as _single_) I don't see how the records without a ra

[GitHub] [datafu] dependabot[bot] opened a new pull request, #22: Bump nokogiri from 1.13.4 to 1.13.5 in /site

2022-05-18 Thread GitBox
dependabot[bot] opened a new pull request, #22: URL: https://github.com/apache/datafu/pull/22 Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.13.4 to 1.13.5. Release notes Sourced from https://github.com/sparklemotion/nokogiri/releases";>nokogiri's releases.

[GitHub] [datafu] eyala merged pull request #22: Bump nokogiri from 1.13.4 to 1.13.5 in /site

2022-05-19 Thread GitBox
eyala merged PR #22: URL: https://github.com/apache/datafu/pull/22 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@datafu.apache.org Fo

[GitHub] [datafu] dependabot[bot] opened a new pull request, #23: Bump nokogiri from 1.13.5 to 1.13.6 in /site

2022-05-23 Thread GitBox
dependabot[bot] opened a new pull request, #23: URL: https://github.com/apache/datafu/pull/23 Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.13.5 to 1.13.6. Release notes Sourced from https://github.com/sparklemotion/nokogiri/releases";>nokogiri's releases.

[GitHub] [datafu] eyala merged pull request #23: Bump nokogiri from 1.13.5 to 1.13.6 in /site

2022-06-01 Thread GitBox
eyala merged PR #23: URL: https://github.com/apache/datafu/pull/23 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@datafu.apache.org Fo

[GitHub] [datafu] eyala commented on pull request #21: Keep the unmatched single records in joinWithRange

2022-06-27 Thread GitBox
eyala commented on PR #21: URL: https://github.com/apache/datafu/pull/21#issuecomment-1167388998 If you want to submit just your test cases, I've made [a JIRA issue for generic test improvements](https://issues.apache.org/jira/browse/DATAFU-164). -- This is an automated message from the A

[GitHub] [datafu] benraha opened a new pull request, #24: Added a UDAF with the collect_limited_list functionality

2022-07-10 Thread GitBox
benraha opened a new pull request, #24: URL: https://github.com/apache/datafu/pull/24 Added collectLimitedList, A UDAF, which is like collect_list, but receives a parameter that limits the number of items to be collected, chosen randomly. This is useful when one wants to collect items

[GitHub] [datafu] uzadude opened a new pull request, #25: Adding SparkUDFs

2022-07-12 Thread GitBox
uzadude opened a new pull request, #25: URL: https://github.com/apache/datafu/pull/25 # Summary - Adding a register UDFs annotation to conveniently register handy UDFs. - Adding a few simple handy UDFs -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [datafu] uzadude commented on pull request #25: Adding SparkUDFs

2022-07-12 Thread GitBox
uzadude commented on PR #25: URL: https://github.com/apache/datafu/pull/25#issuecomment-1181744112 @eyala please check this out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [datafu] eyala opened a new pull request, #26: Update Spark versions for testing script

2022-07-13 Thread GitBox
eyala opened a new pull request, #26: URL: https://github.com/apache/datafu/pull/26 Belatedly make sure our testing script uses the latest Spark versions. Hopefully this will also activate the test-on-pr action. -- This is an automated message from the Apache Git Service. To respond to th

  1   2   >