y.
>
> With respect to Tableau… their entire interface in to the big data world
> revolves around the JDBC/ODBC interface. So if you don’t have that piece as
> part of your solution, you’re DOA w respect to Tableau.
>
> Have you considered Drill as your JDBC connecti
On that note, here is an article that Databricks made regarding using
Tensorflow in conjunction with Spark.
https://databricks.com/blog/2016/01/25/deep-learning-with-apache-spark-and-tensorflow.html
Cheers,
Ben
> On Oct 19, 2016, at 3:09 AM, Gourav Sengupta
>
.@gmail.com> wrote:
>
> Agreed. But as it states deeper integration with (scala) is yet to be
> developed.
> Any thoughts on how to use tensorflow with scala ? Need to write wrappers I
> think.
>
>
> On Oct 19, 2016 7:56 AM, "Benjamin Kim" <bbuil...@gmail.com
lta load data in spark
> table cache and expose it through the thriftserver. But you have to implement
> the loading logic, it can be very simple to very complex depending on your
> needs.
>
>
> 2016-10-17 19:48 GMT+02:00 Benjamin Kim <bbuil...@gmail.com
> <mailto:bb
k/kite
This might be useful.
Thanks!
2016-12-23 7:01 GMT+09:00 Benjamin Kim <bbuil...@gmail.com>:
Has anyone tried to merge *.gz.parquet files before? I'm trying to merge
them into 1 file after they are output from Spark. Doing a coalesce(1) on
the Spark cluster will not work. It just d
s://issues.apache.org/jira/browse/PARQUET-460>
>
> It seems parquet-tools allows merge small Parquet files into one.
>
>
> Also, I believe there are command-line tools in Kite -
> https://github.com/kite-sdk/kite <https://github.com/kite-sdk/kite>
>
> This might
Has anyone tried to merge *.gz.parquet files before? I'm trying to merge them
into 1 file after they are output from Spark. Doing a coalesce(1) on the Spark
cluster will not work. It just does not have the resources to do it. I'm trying
to do it using the commandline and not use Spark. I will
I’m curious about if and when Spark SQL will ever remove its dependency on Hive
Metastore. Now that Spark 2.1’s SparkSession has superseded the need for
HiveContext, are there plans for Spark to no longer use the Hive Metastore
service with a “SparkSchema” service with a PostgreSQL, MySQL, etc.
Has anyone seen AWS Glue? I was wondering if there is something similar going
to be built into Spark Structured Streaming? I like the Data Catalog idea to
store and track any data source/destination. It profiles the data to derive the
scheme and data types. Also, it does some sort-of automated
Hi Bo,
+1 for your project. I come from the world of data warehouses, ETL, and
reporting analytics. There are many individuals who do not know or want to do
any coding. They are content with ANSI SQL and stick to it. ETL workflows are
also done without any coding using a drag-and-drop user
With AWS having Glue and GCE having Dataprep, is Databricks coming out with
an equivalent or better? I know that Serverless is a new offering, but will
it go farther with automatic data schema discovery, profiling, metadata
storage, change triggering, joining, transform suggestions, etc.?
Just
To add, we have a CDH 5.12 cluster with Spark 2.2 in our data center.
On Mon, Nov 13, 2017 at 3:15 PM Benjamin Kim <bbuil...@gmail.com> wrote:
> Does anyone know if there is a connector for AWS Kinesis that can be used
> as a source for Structured Streaming?
>
> Thanks.
>
>
Does anyone know if there is a connector for AWS Kinesis that can be used
as a source for Structured Streaming?
Thanks.
I have a question about this. The documentation compares the concept
similar to BigQuery. Does this mean that we will no longer need to deal
with instances and just pay for execution duration and amount of data
processed? I’m just curious about how this will be priced.
Also, when will it be ready
ted correctly, if you're joining then overwrite otherwise only
> append as it removes dups.
>
> I think, in this scenario, just change it to write.mode('overwrite') because
> you're already reading the old data and your job would be done.
>
>
> On Sat 2 Jun, 2018, 10:27 PM Be
:
> Benjamin,
>
> The append will append the "new" data to the existing data with removing
> the duplicates. You would need to overwrite the file everytime if you need
> unique values.
>
> Thanks,
> Jayadeep
>
> On Fri, Jun 1, 2018 at 9:31 PM Benjamin Kim wrote
I have a situation where I trying to add only new rows to an existing data set
that lives in S3 as gzipped parquet files, looping and appending for each hour
of the day. First, I create a DF from the existing data, then I use a query to
create another DF with the data that is new. Here is the
101 - 117 of 117 matches
Mail list logo