[DISCUSS] [Spark SQL, PySpark] Combining StructTypes into a new StructType

2022-08-09 Thread Tim
why this is not yet part of StructType's functionality? If you support this idea, I could create a first PR for further and deeper discussion. Best Tim - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Dropping SortExec from SortMergeJoins on presorted data

2019-03-29 Thread tim
xpressions. This breaks in cases where our processing has caused the data to *lose* its sortedness. Have we missed something simple or do we have an exotic use-case unlike other users? Thanks! Tim -- Sent from: http://apache-spark-developers-list.1001551.n3.

Re: Honor ParseMode in AvroFileFormat

2019-03-07 Thread tim
/facepalm Here we go: https://issues.apache.org/jira/browse/SPARK-27093 Tim -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Honor ParseMode in AvroFileFormat

2019-03-07 Thread tim
Thanks Xiao, it's good to have that validated. I've created a ticket here: https://issues.apache.org/jira/browse/AVRO-2342 -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail:

Honor ParseMode in AvroFileFormat

2019-03-07 Thread tim
. Is there any reason why this behavior doesn't exist or obvious workaround that I missed? If not, are there any further details needed to consider adding this capability to Spark's Avro reader? I’m happy to propose a solution and contribute this update if somebody isn't already working on it. Thanks, Tim

Re: eager execution and debuggability

2018-05-09 Thread Tim Hunter
ard trick in lazy environments and languages. Tim On Wed, May 9, 2018 at 3:26 AM, Reynold Xin <r...@databricks.com> wrote: > Yes would be great if possible but it’s non trivial (might be impossible > to do in general; we already have stacktraces that point to line numbers > w

[ml] Deep learning talks at the Spark Summit Europe

2017-10-10 Thread Tim Hunter
and TensorFlow as a service, by Jim Dowling If you have not gotten your ticket yet, there is still time! You can use the promo code DatabricksEU for a 15% discount. Looking forward to meeting the dev community on the East side of the Atlantic. Tim

Re: [VOTE][SPIP] SPARK-21866 Image support in Apache Spark

2017-09-28 Thread Tim Hunter
i-dimensional tensors >>> too. >>> >>> Matei >>> >>> > On Sep 23, 2017, at 7:27 AM, Yanbo Liang <yblia...@gmail.com> wrote: >>> > >>> > +1 >>> > >>> > On Sat, Sep 23, 2017 at 7:08 PM, Noman Khan <

[VOTE][SPIP] SPARK-21866 Image support in Apache Spark

2017-09-21 Thread Tim Hunter
Hello community, I would like to call for a vote on SPARK-21866. It is a short proposal that has important applications for image processing and deep learning. Joseph Bradley has offered to be the shepherd. JIRA ticket: https://issues.apache.org/jira/browse/SPARK-21866 PDF version:

SPIP: SPARK-21866 Image support in Apache Spark

2017-09-05 Thread Tim Hunter
Hello community, I would like to start a discussion about adding support for images in Spark. We will follow up with a formal vote in two weeks. Please feel free to comment on the JIRA ticket too. JIRA ticket: https://issues.apache.org/jira/browse/SPARK-21866 PDF version:

Re: Question on Spark's graph libraries roadmap

2017-03-13 Thread Tim Hunter
on popular demand. Along these lines, GraphBLAS could be added on top of it if someone is willing to step up. Tim [1] https://spark-summit.org/east-2016/events/graphframes-graph-queries-in-spark-sql/ On Mon, Mar 13, 2017 at 2:58 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: &

Re: [Spark Namespace]: Expanding Spark ML under Different Namespace?

2017-02-24 Thread Tim Hunter
Regarding logging, Graphframes makes a simple wrapper this way: https://github.com/graphframes/graphframes/blob/master/src/main/scala/org/ graphframes/Logging.scala Regarding the UDTs, they have been hidden to be reworked for Datasets, the reasons being detailed here [1]. Can you describe your

Re: Feedback on MLlib roadmap process proposal

2017-02-23 Thread Tim Hunter
works well in practice. In the meantime, though, there are plenty of things that we could do to help developers of other libraries to have a great experience with Spark. Matei alluded to that in his Spark Summit keynote when he mentioned better integration with low-level libraries. Tim On Thu, Feb 23

Re: Design document - MLlib's statistical package for DataFrames

2017-02-17 Thread Tim Hunter
Hi Brad, this task is focusing on moving the existing algorithms, so that we are held up by parity issues. Do you have some paper suggestions for cardinality? I do not think there is a feature request on JIRA either. Tim On Thu, Feb 16, 2017 at 2:21 PM, bradc <brad.carl...@oracle.com>

Design document - MLlib's statistical package for DataFrames

2017-02-16 Thread Tim Hunter
is rapidly approaching, and it would be great if we could claim parity for this release! Cheers Tim - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Spark Improvement Proposals

2017-01-05 Thread Tim Hunter
an opinion on these, but why not make a pick and reevaluate this decision later? This is not a binding process at this point. Tim On Tue, Jan 3, 2017 at 3:16 PM, Cody Koeninger <c...@koeninger.org> wrote: > I don't have a concern about voting vs consensus. > > I have a concern t

GraphFrames 0.2.0 released

2016-08-16 Thread Tim Hunter
the DataFrame API, combined with a new API for motif finding. The user also benefits from DataFrame performance optimizations within the Spark SQL engine. Cheers Tim

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Tim Hunter
+1 This release passes all tests on the graphframes and tensorframes packages. On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger wrote: > If we're considering backporting changes for the 0.8 kafka > integration, I am sure there are people who would like to get > >

Request for comments: Tensorframes, an integration library between TensorFlow and Spark DataFrames

2016-03-19 Thread Tim Hunter
Tim Hunter

Re: [VOTE] Release Apache Spark 1.6.1 (RC1)

2016-03-03 Thread Tim Preece
Regarding the failure in org.apache.spark.streaming.kafka.DirectKafkaStreamSuite","offset recovery We have been seeing the very same problem with the IBM JDK for quite a long time ( since at least July 2015 ). It is intermittent and we had dismissed it as a testcase problem. -- View this

Re: [VOTE] Release Apache Spark 1.6.1 (RC1)

2016-03-03 Thread Tim Preece
I just created the following pull request ( against master but would like on 1.6.1 ) for the isolated classloader fix ( Spark-13648 ) https://github.com/apache/spark/pull/11495 -- View this message in context:

Re: [VOTE] Release Apache Spark 1.6.1 (RC1)

2016-03-03 Thread Tim Preece
I have been testing 1.6.1RC1 using the IBM Java SDK. I notice a problem ( with the org.apache.spark.sql.hive.client.VersionsSuite tests ) after a recent Spark 1.6.1 change. Pull request - https://github.com/apache/spark/commit/f7898f9e2df131fa78200f6034508e74a78c2a44 The change introduced a

Introducing spark-sklearn, a scikit-learn integration package for Spark

2016-02-10 Thread Tim Hunter
, documentation or code contributions are much welcome (Apache 2.0 license). Cheers Tim - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org

Re: Tungsten in a mixed endian environment

2016-01-15 Thread Tim Preece
So if Spark does not support heterogeneous endianness clusters, should Spark at least always support homogeneous endianess clusters ? I ask because I just noticed https://issues.apache.org/jira/browse/SPARK-12785 which appears to be introducing a new feature designed for Little Endian only.

Re: A proposal for Spark 2.0

2015-11-11 Thread Tim Preece
Considering Spark 2.x will run for 2 years, would moving up to Scala 2.12 ( pencilled in for Jan 2016 ) make any sense ? - although that would then pre-req Java 8. -- View this message in context:

Re: Block Transfer Service encryption support

2015-11-10 Thread Tim Preece
So it appears the tests fail because of an SSLHandshakeException. Tracing the failure I see: 3,0001,Using SSLEngineImpl.\0A 3,0001,\0AIs initial handshake: true\0A 3,0001,Ignoring unsupported cipher suite: SSL_RSA_WITH_DES_CBC_SHA for TLSv1.2\0A 3,0001,No available cipher suite for TLSv1.2\0A

Re: Block Transfer Service encryption support

2015-11-10 Thread Tim Preece
etchIntegrationSuite.fetchFileChunk:184 expected:<[]> but was:<[1]> SslTransportClientFactorySuite>TransportClientFactorySuite.neverReturnInactiveClients:165 null SslTransportClientFactorySuite>TransportClientFactorySuite.returnDifferentClientsForDifferentServers:145 null Tim

Re: Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-09 Thread Tim Preece
Searching shows several people hit this same NPE in AppClient.scala line 160 ( perhaps because appID was null - could application had be stopped before registered ?) -- View this message in context:

Re: Anyone has perfect solution for spark source code compilation issue on intellij

2015-11-09 Thread Tim Preece
I've had success building with maven ( 3.3.3 ) with: Intellij 14.1.5 scala 2.10.4 openjdk 7 (1.7.0_79) What OS/Platform are you on ? -- View this message in context:

Intermittent timeout failure org/apache/spark/sql/hive/thriftserver/CliSuite.scala

2015-08-12 Thread Tim Preece
( e.g https://issues.apache.org/jira/browse/SPARK-7973) may be a result of this Scala issue. I am new to the Spark community. Is there a preferred way to track the fact the Spark testcase CliSuite has a dependency on the above Scala issue ? Tim Preece

Re: [ANNOUNCE] Ending Java 6 support in Spark 1.5 (Sep 2015)

2015-05-19 Thread Tim Ellison
Sean, Did the JIRA get created? If so I can't find it so a pointer would be helpful. Regards, Tim On 06/05/15 06:59, Reynold Xin wrote: Sean - Please do. On Tue, May 5, 2015 at 10:57 PM, Sean Owen so...@cloudera.com wrote: OK to file a JIRA to scrape out a few Java 6-specific things

Re: running the Terasort example

2014-12-17 Thread Tim Harsch
On 12/16/14, 11:42 PM, Ewan Higgs ewan.hi...@ugent.be wrote: Hi Tim, On 16 Dec 2014, at 19:27, Tim Harsch thar...@cray.com wrote: Hi Ewan, Thanks, I think I was just a bit confused at the time, I was looking at the spark-perf repo when there was the problem (uh.. ok)… The PR that I am

Re: running the Terasort example

2014-12-16 Thread Tim Harsch
/terasort/TeraOutputFormat.scala:76: value hsync is not a member of org.apache.hadoop.fs.FSDataOutputStream [ERROR] out.hsync(); [ERROR] ^ I can get past this by setting hadoop.version to 2.5.0 in the parent pom. Thanks, Tim On 12/16/14, 12:38 AM, Ewan Higgs ewan.hi

running the Terasort example

2014-12-11 Thread Tim Harsch
changes weren¹t pushed? Thanks for any help, Tim - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org