[jira] [Updated] (HUDI-1551) Support Partition with BigDecimal/Integer field

2021-04-06 Thread Chanh Le (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chanh Le updated HUDI-1551: --- Description: In my data the time indicator field is in BigDecimal/Integer -> due to trading data related

[jira] [Updated] (HUDI-1551) Support Partition with BigDecimal/Integer field

2021-04-06 Thread Chanh Le (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chanh Le updated HUDI-1551: --- Fix Version/s: (was: 0.7.0) > Support Partition with BigDecimal/Integer fi

[jira] [Updated] (HUDI-1551) Support Partition with BigDecimal/Integer field

2021-04-06 Thread Chanh Le (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chanh Le updated HUDI-1551: --- Summary: Support Partition with BigDecimal/Integer field (was: Support Partition with BigDecimal field

[jira] [Created] (HUDI-1551) Support Partition with BigDecimal field

2021-01-25 Thread Chanh Le (Jira)
Chanh Le created HUDI-1551: -- Summary: Support Partition with BigDecimal field Key: HUDI-1551 URL: https://issues.apache.org/jira/browse/HUDI-1551 Project: Apache Hudi Issue Type: New Feature

Re: SPARK environment settings issue when deploying a custom distribution

2017-06-12 Thread Chanh Le
=2.7.0 -Phive -Phive-thriftserver -Pmesos -Pyarn On Mon, Jun 12, 2017 at 6:14 PM Chanh Le wrote: > Hi everyone, > > Recently I discovered an issue when processing csv of spark. So I decided > to fix it following this https://issues.apache.org/jira/browse/SPARK-21024 I >

SPARK environment settings issue when deploying a custom distribution

2017-06-12 Thread Chanh Le
Hi everyone, Recently I discovered an issue when processing csv of spark. So I decided to fix it following this https://issues.apache.org/jira/browse/SPARK-21024 I built a custom distribution for internal uses. I built it in my local machine then upload the distribution to server. server's *~/.ba

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-09 Thread Chanh Le
Hi Takeshi, Thank you very much. Regards, Chanh On Thu, Jun 8, 2017 at 11:05 PM Takeshi Yamamuro wrote: > I filed a jira about this issue: > https://issues.apache.org/jira/browse/SPARK-21024 > > On Thu, Jun 8, 2017 at 1:27 AM, Chanh Le wrote: > >> Can you recomm

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-08 Thread Chanh Le
Can you recommend one? Thanks. On Thu, Jun 8, 2017 at 2:47 PM Jörn Franke wrote: > You can change the CSV parser library > > On 8. Jun 2017, at 08:35, Chanh Le wrote: > > > I did add mode -> DROPMALFORMED but it still couldn't ignore it because > the error ra

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Chanh Le
have more than maxColumns. Choose mode "DROPMALFORMED" > > On 8. Jun 2017, at 03:04, Chanh Le wrote: > > Hi Takeshi, Jörn Franke, > > The problem is even I increase the maxColumns it still have some lines > have larger columns than the one I set and it will cost a lot o

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Chanh Le
5 AM, Jörn Franke wrote: > >> Spark CSV data source should be able >> >> On 7. Jun 2017, at 17:50, Chanh Le wrote: >> >> Hi everyone, >> I am using Spark 2.1.1 to read csv files and convert to avro files. >> One problem that I am facing is if one row of

[CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Chanh Le
Hi everyone, I am using Spark 2.1.1 to read csv files and convert to avro files. One problem that I am facing is if one row of csv file has more columns than maxColumns (default is 20480). The process of parsing was stop. Internal state when error was thrown: line=1, column=3, record=0, charIndex=

Re: How to query a query with not contain, not start_with, not end_with condition effective?

2017-02-21 Thread Chanh Le
ike '%sell%', then you can just > try left semi join, which Spark will use SortMerge join in this case, I guess. > > Yong > > From: Yong Zhang mailto:java8...@hotmail.com>> > Sent: Tuesday, February 21, 2017 1:17 PM > To: Sidney Feiner; Chanh Le; user @spar

Re: How to query a query with not contain, not start_with, not end_with condition effective?

2017-02-21 Thread Chanh Le
4:56 PM, Chanh Le wrote: > > Hi everyone, > > I am working on a dataset like this > user_id url > 1 lao.com/buy <http://lao.com/buy> > 2 bao.com/sell <http://bao.com/sell> > 2 cao.com/market <http://ca

How to query a query with not contain, not start_with, not end_with condition effective?

2017-02-21 Thread Chanh Le
Hi everyone, I am working on a dataset like this user_id url 1lao.com/buy 2bao.com/sell 2cao.com/market 1lao.com/sell 3vui.com/sell I have to find all user_id with url not contain sell. Wh

Re: How to config zookeeper quorum in sqlline command?

2017-02-15 Thread Chanh Le
Juvenn Woo > Sent with Sparrow <http://www.sparrowmailapp.com/?sig> > > On Thursday, 16 February 2017 at 12:41 PM, Chanh Le wrote: > >> Hi everybody, >> I am a newbie start using phoenix for a few days after did some research >> about config zookeeper quorum and still stuc

How to config zookeeper quorum in sqlline command?

2017-02-15 Thread Chanh Le
Hi everybody, I am a newbie start using phoenix for a few days after did some research about config zookeeper quorum and still stuck I finally wanna ask directly into the community. Current zk quorum of mine a little odd "hbase.zookeeper.quorum", "zoo1:2182,zoo1:2183,zoo2:2182" I edited the env

How to set classpath for a job that submit to Mesos cluster

2016-12-13 Thread Chanh Le
Hi everyone, I have a job that read segment data from druid then convert to csv. When I run it in local mode it works fine. /home/airflow/spark-2.0.2-bin-hadoop2.7/bin/spark-submit --driver-memory 1g --master "local[4]" --files /home/airflow/spark-jobs/forecast_jobs/prod.conf --conf spark.execut

[jira] [Created] (ZEPPELIN-1723) Math formula support library path error

2016-11-28 Thread Chanh Le (JIRA)
Chanh Le created ZEPPELIN-1723: -- Summary: Math formula support library path error Key: ZEPPELIN-1723 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1723 Project: Zeppelin Issue Type: Bug

Re: Sharing RDDS across applications and users

2016-10-28 Thread Chanh Le
6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on t

Re: Sharing RDDS across applications and users

2016-10-27 Thread Chanh Le
r will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 27 October 2016 at 11:29, Chanh Le <mailto:giaosu...@gmail.com>> wrote: > Hi Mich, > Alluxio is the good option to go. > > Regards, > Chanh > >

Re: Sharing RDDS across applications and users

2016-10-27 Thread Chanh Le
Hi Mich, Alluxio is the good option to go. Regards, Chanh > On Oct 27, 2016, at 5:28 PM, Mich Talebzadeh > wrote: > > > There was a mention of using Zeppelin to share RDDs with many users. From the > notes on Zeppelin it appears that this is sharing UI and I am not sure how > easy it is go

Re: How to make Mesos Cluster Dispatcher of Spark 1.6.1 load my config files?

2016-10-19 Thread Chanh Le
; > I found a workaround that works to me: > http://stackoverflow.com/questions/29552799/spark-unable-to-find-jdbc-driver/40114125#40114125 > > <http://stackoverflow.com/questions/29552799/spark-unable-to-find-jdbc-driver/40114125#40114125> > > Regards, > Daniel >

Re: Best Savemode option to write Parquet file

2016-10-06 Thread Chanh Le
d do other modes do the same or all executors write to the folder in > parallel . > > Thank You, > Anu > > On Thu, Oct 6, 2016 at 11:36 AM, Chanh Le <mailto:giaosu...@gmail.com>> wrote: > Hi Abnubhav, > The best way to store parquet is partition it by time or speci

Re: Best Savemode option to write Parquet file

2016-10-06 Thread Chanh Le
Hi Abnubhav, The best way to store parquet is partition it by time or specific field that you are going to mark for delete after the time. in my case I partition my data by time so I can easy to delete the data after 30 days. Use with mode Append and disable the summary information sc.hadoopCon

How to make Mesos Cluster Dispatcher of Spark 1.6.1 load my config files?

2016-10-05 Thread Chanh Le
Hi everyone, I have the same config in both mode and I really want to change config whenever I run so I created a config file and run my application with it. My problem is: It’s works with these config without using Mesos Cluster Dispatcher. /build/analytics/spark-1.6.1-bin-hadoop2.6/bin/spark-s

Re: Why add --driver-class-path jbdc.jar works and --jars not? (1.6.1)

2016-10-05 Thread Chanh Le
drivers need to be in the system classpath. --jars > places them in an app-specific class loader, so it doesn't work. > > On Wed, Oct 5, 2016 at 3:32 AM, Chanh Le wrote: >> Hi everyone, >> I just wondering why when I run my program I need to add jdbc.jar into >>

Why add --driver-class-path jbdc.jar works and --jars not? (1.6.1)

2016-10-05 Thread Chanh Le
Hi everyone, I just wondering why when I run my program I need to add jdbc.jar into —driver-class-path instead treat it like a dependency by —jars. My program works with these config ./bin/spark-submit --packages org.apache.spark:spark-streaming-kafka_2.10:1.6.1 --master "local[4]" --class com.a

Re: What is the difference between mini-batch vs real time streaming in practice (not theory)?

2016-09-27 Thread Chanh Le
The different between Stream vs Micro Batch is about Ordering of Messages > Spark Streaming guarantees ordered processing of RDDs in one DStream. Since > each RDD is processed in parallel, there is not order guaranteed within the > RDD. This is a tradeoff design Spark made. If you want to process

Re: Using Zeppelin with Spark FP

2016-09-15 Thread Chanh Le
at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages

Re: Using Zeppelin with Spark FP

2016-09-15 Thread Chanh Le
Hi, I am using Zeppelin 0.7 snapshot and it works well both Spark 2.0 and STS of Spark 2.0. > On Sep 12, 2016, at 4:38 PM, Mich Talebzadeh > wrote: > > Hi Sachin, > > Downloaded Zeppelin 0.6.1 > > Now I can see the plot in a tabular format and graph. it looks good. Many > thanks > > >

Re: Zeppelin patterns with the streaming data

2016-09-13 Thread Chanh Le
Hi Mich, I think it can http://www.quartz-scheduler.org/documentation/quartz-2.1.x/tutorials/crontrigger > On Sep 13, 2016, at 1:57 PM, Mich Talebzadeh > wrote: > > Thanks Sachin. > > The cron gives the gr

Re: Spark 2.0.0 Thrift Server problem with Hive metastore

2016-09-06 Thread Chanh Le
Did anyone use STS of Spark 2.0 on production? For me I still waiting for the compatible in parquet file created by Spark 1.6.1 > On Sep 6, 2016, at 2:46 PM, Campagnola, Francesco > wrote: > > I mean I have installed Spark 2.0 in the same environment where Spark 1.6 > thrift server was runn

Re: Design patterns involving Spark

2016-08-29 Thread Chanh Le
Hi everyone, Seems a lot people using Druid for realtime Dashboard. I’m just wondering of using Druid for main storage engine because Druid can store the raw data and can integrate with Spark also (theoretical). In that case do we need to store 2 separate storage Druid (store segment in HDFS) a

Re: Best practises to storing data in Parquet files

2016-08-28 Thread Chanh Le
> Does parquet file has limit in size ( 1TB ) ? I did’t see any problem but 1TB is too big to operation need to divide into small pieces. > Should we use SaveMode.APPEND for long running streaming app ? Yes, but you need to partition it by time so it easy to maintain like update or delete a spec

Re: Anyone else having trouble with replicated off heap RDD persistence?

2016-08-16 Thread Chanh Le
Hi Michael, You should you Alluxio instead. http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html It should be easier. Regards, Chanh > On Aug 17, 2016, at 5:45 AM, Michael Allman wrote: > > Hello, > > A c

Re: Does Spark SQL support indexes?

2016-08-13 Thread Chanh Le
Hi Taotao, Spark SQL doesn’t support index :). > On Aug 14, 2016, at 10:03 AM, Taotao.Li wrote: > > > hi, guys, does Spark SQL support indexes? if so, how can I create an index > on my temp table? if not, how can I handle some specific queries on a very > large table? it would iterate al

Re: Spark Thrift Server (Spark 2.0) show table has value with NULL in all fields

2016-08-10 Thread Chanh Le
Hi Gene, It's a Spark 2.0 issue. I switch to Spark 1.6.1 it's ok now. Thanks. On Thursday, July 28, 2016 at 4:25:48 PM UTC+7, Chanh Le wrote: > > Hi everyone, > > I have problem when I create a external table in Spark Thrift Server (STS) > and query the data. &

Re: hdfs persist rollbacks when spark job is killed

2016-08-07 Thread Chanh Le
is no move operation. > > I generally have a set of Data Quality checks after each job to ascertain > whether everything went fine, the results are stored so that it can be > published in a graph for monitoring, thus solving two purposes. > > > Regards, > Gourav Sengupt

Re: hdfs persist rollbacks when spark job is killed

2016-08-07 Thread Chanh Le
It’s out of the box in Spark. When you write data into hfs or any storage it only creates a new parquet folder properly if your Spark job was success else only _temp folder inside to mark it’s still not success (spark was killed) or nothing inside (Spark job was failed). > On Aug 8, 2016,

Re: [Spark1.6] Or (||) operator not working in DataFrame

2016-08-07 Thread Chanh Le
You should use df.where(conditionExpr) which is more convenient to express some simple term in SQL. /** * Filters rows using the given SQL expression. * {{{ * peopleDf.where("age > 15") * }}} * @group dfops * @since 1.5.0 */ def where(conditionExpr: String): DataFrame = { filter(Colu

Re: [Spark 2.0] Problem with Spark Thrift Server show NULL instead of showing BIGINT value

2016-08-04 Thread Chanh Le
I checked with Spark 1.6.1 it still works fine. I also check out latest source code in Spark 2.0 branch and built and get the same issue. I think because of changing API to dataset in Spark 2.0? Regards, Chanh > On Aug 5, 2016, at 9:44 AM, Chanh Le wrote: > > Hi Nicholas, > Th

Re: [Spark 2.0] Problem with Spark Thrift Server show NULL instead of showing BIGINT value

2016-08-04 Thread Chanh Le
Szandor Hakobian, Ph.D. > Data Scientist > Rally Health > nicholas.hakob...@rallyhealth.com <mailto:nicholas.hakob...@rallyhealth.com> > > > On Thu, Aug 4, 2016 at 4:53 AM, Chanh Le <mailto:giaosu...@gmail.com>> wrote: > Hi Takeshi, > I already have changed the colum

Re: [Thriftserver2] Controlling number of tasks

2016-08-03 Thread Chanh Le
I believe there is no way to reduce tasks by Hive using coalesce because when It come to Hive just read the files and depend on number of files you put into. So The way to did was coalesce at the ELT layer put a small number of files as possible reduce IO time for reading file. > On Aug 3, 201

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Chanh Le
data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 2 August 2016 at 10:18, Chan

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Chanh Le
author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 2 August 2016 at 10:18, Chanh Le <mailto:giaosu...@gmail.com>> wrote: > I tried and it works perfectly. > > Regards, > Chanh > > >

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Chanh Le
citly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 2 August 2016 at 09:13, Chanh Le <mailto:giaosu...@gmail.com>> wrote: > Hi Mich, > I use Spark Thrift Server basically it acts

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Chanh Le
elying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 2 August 2016 at 08:41, Chanh Le <mailto:giaosu...@gmail.com>> wrote: &g

Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Chanh Le
Hi everyone, I setup STS and use Zeppelin to query data through JDBC connection. A problem we are facing is users usually forget to put limit in the query so it causes hang the cluster. SELECT * FROM tableA; Is there anyway to config the limit by default ? Regards, Chanh -

Re: [Spark 2.0] Why MutableInt cannot be cast to MutableLong?

2016-07-31 Thread Chanh Le
think this error still happen in Spark 2.0 > On Aug 1, 2016, at 9:21 AM, Chanh Le wrote: > > Sorry my bad, I ran in Spark 1.6.1 but what about this error? > Why Int cannot be cast to Long? > > > Thanks. > > >> On Aug 1, 2016, at 2:44 AM, Michael Armbrust

Re: [Spark 2.0] Why MutableInt cannot be cast to MutableLong?

2016-07-31 Thread Chanh Le
12354 > <https://github.com/apache/spark/pull/12354>. > > On Sun, Jul 31, 2016 at 2:12 AM, Chanh Le <mailto:giaosu...@gmail.com>> wrote: > Hi everyone, > Why MutableInt cannot be cast to MutableLong? > It’s really weird and seems Spark 2.0

[Spark 2.0] Why MutableInt cannot be cast to MutableLong?

2016-07-31 Thread Chanh Le
Hi everyone, Why MutableInt cannot be cast to MutableLong? It’s really weird and seems Spark 2.0 has a lot of error with parquet about format. org.apache.spark.sql.catalyst.expressions.MutableInt cannot be cast to org.apache.spark.sql.catalyst.expressions.MutableL ong Caused by: org.apache.parq

[jira] [Comment Edited] (SPARK-16518) Schema Compatibility of Parquet Data Source

2016-07-30 Thread Chanh Le (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400755#comment-15400755 ] Chanh Le edited comment on SPARK-16518 at 7/30/16 5:3

[jira] [Commented] (SPARK-16518) Schema Compatibility of Parquet Data Source

2016-07-30 Thread Chanh Le (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400755#comment-15400755 ] Chanh Le commented on SPARK-16518: -- Did we have a patch for that? Right now I have

Re: Spark Thrift Server (Spark 2.0) show table has value with NULL in all fields

2016-07-30 Thread Chanh Le
Hi Mich some thing different between your log > On Jul 30, 2016, at 6:58 PM, Mich Talebzadeh > wrote: > > parquet-mr version 1.6.0 > org.apache.parquet.VersionParser$VersionParseException: Could not parse > created_by: parquet-mr version 1.6.0 using format: (.+) version ((.*) > )?\(build ?(.*)

Re: Spark Thrift Server (Spark 2.0) show table has value with NULL in all fields

2016-07-30 Thread Chanh Le
s.com <http://talebzadehmich.wordpress.com/> > > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed

Re: Spark Thrift Server (Spark 2.0) show table has value with NULL in all fields

2016-07-30 Thread Chanh Le
isclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 30 July 2016 at 11:43, Chanh Le <mailto:giaosu...@gmail.com>> wrote: > Hi Mich, > Thanks for supporting. Here some of my thoughts.

Re: Spark Thrift Server (Spark 2.0) show table has value with NULL in all fields

2016-07-30 Thread Chanh Le
2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > http://talebzadehmich.wordpress.com > > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technica

Re: Spark Standalone Cluster: Having a master and worker on the same node

2016-07-28 Thread Chanh Le
Hi Jestin, I saw most of setup usually setup along master and slave in a same node. Because I think master doesn't do as much job as slave does and resource is expensive we need to use it. BTW In my setup I setup along master and slave. I have 5 nodes and 3 of which are master and slave running al

Re: Spark Thrift Server 2.0 set spark.sql.shuffle.partitions not working when query

2016-07-28 Thread Chanh Le
Thank you Takeshi it works fine now. Regards, Chanh > On Jul 28, 2016, at 2:03 PM, Takeshi Yamamuro wrote: > > Hi, > > you need to set the value when you just start the server. > > // maropu > > On Thu, Jul 28, 2016 at 3:59 PM, Chanh Le <mailto:giaosu...@gm

Spark 2.0 just released

2016-07-26 Thread Chanh Le
Its official now http://spark.apache.org/releases/spark-release-2-0-0.html Everyone should check it out.

Re: Spark Web UI port 4040 not working

2016-07-26 Thread Chanh Le
You’re running in StandAlone Mode? Usually inside active task it will show the address of current job. or you can check in master node by using netstat -apn | grep 4040 > On Jul 26, 2016, at 8:21 AM, Jestin Ma wrote: > > Hello, when running spark jobs, I can access the master UI (port 8080 one

Re: dataframe.foreach VS dataframe.collect().foreach

2016-07-26 Thread Chanh Le
Hi Ken, blacklistDF -> just DataFrame Spark is lazy until you call something like collect, take, write it will execute the hold process like you do map or filter before you collect. That mean until you call collect spark do nothing so you df would not have any data -> can’t call foreach. Call c

Re: Optimize filter operations with sorted data

2016-07-21 Thread Chanh Le
filter ? > > 2016-07-07 11:58 GMT+02:00 Chanh Le : > >> Hi Tan, >> It depends on how data organise and what your filter is. >> For example in my case: I store data by partition by field time and >> network_id. If I filter by time or network_id or both and with othe

Re: run spark apps in linux crontab

2016-07-20 Thread Chanh Le
> > Thanks&Best regards! > San.Luo > > - 原始邮件 - > 发件人:Chanh Le > 收件人:luohui20...@sina.com > 抄送人:focus , user > 主题:Re: run spark apps in linux crontab > 日期:2016年07月21日 11点38分 > > you should you use command.sh | tee file.log >

Re: run spark apps in linux crontab

2016-07-20 Thread Chanh Le
you should you use command.sh | tee file.log > On Jul 21, 2016, at 10:36 AM, > wrote: > > > thank you focus, and all. > this problem solved by adding a line ". /etc/profile" in my shell. > > > > > Thanks&Best regards! > San.Luo > > - 原始邮件 - > 发件人:

Attribute name "sum(proceeds)" contains invalid character(s) among " ,;{}()\n\t="

2016-07-20 Thread Chanh Le
Hi everybody, I got a error about the name of the columns is not following the rule. Please tell me the way to fix it. Here is my code metricFields Here is a Seq of metrics: spent, proceed, click, impression sqlContext .sql(s"select * from hourly where time between '$dateStr-00' and '$dateStr

[jira] [Created] (MESOS-5868) Task is running but not show in UI

2016-07-19 Thread Chanh Le (JIRA)
Chanh Le created MESOS-5868: --- Summary: Task is running but not show in UI Key: MESOS-5868 URL: https://issues.apache.org/jira/browse/MESOS-5868 Project: Mesos Issue Type: Bug Components

[jira] [Updated] (MESOS-5868) Task is running but not show in UI

2016-07-19 Thread Chanh Le (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chanh Le updated MESOS-5868: Description: This happen when I try to restart the masters node without downing any slaves. As you can see

Re: the spark job is so slow - almost frozen

2016-07-18 Thread Chanh Le
Hi, What about the network (bandwidth) between hive and spark? Does it run in Hive before then you move to Spark? Because It's complex you can use something like EXPLAIN command to show what going on. > On Jul 18, 2016, at 5:20 PM, Zhiliang Zhu wrote: > > the sql logic in the program is v

Re: Inode for STS

2016-07-18 Thread Chanh Le
Hi Ayan, I seem like you mention this https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.start.cleanup.scratchdir

Re: How to run Zeppelin and Spark Thrift Server Together

2016-07-17 Thread Chanh Le
Simba is > complaining about. Try to change the protocol to SASL? > > On Fri, Jul 15, 2016 at 1:20 PM, Chanh Le <mailto:giaosu...@gmail.com>> wrote: > Hi Ayan, > Thanks. I got it. > Did you have any problem when connecting Oracle BI with STS? > > I have some error >

[jira] [Created] (ZEPPELIN-1173) Could not set hive.metastore.warehouse.dir property

2016-07-13 Thread Chanh Le (JIRA)
Chanh Le created ZEPPELIN-1173: -- Summary: Could not set hive.metastore.warehouse.dir property Key: ZEPPELIN-1173 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1173 Project: Zeppelin

Re: Memory issue java.lang.OutOfMemoryError: Java heap space

2016-07-13 Thread Chanh Le
uction" was a little more complex that I > thought. > >> On Jul 13, 2016, at 10:35 PM, Chanh Le > <mailto:giaosu...@gmail.com>> wrote: >> >> Hi Jean, >> How do you run your Spark Application? Local Mode, Cluster Mode? >> If you run in local mode d

Re: Memory issue java.lang.OutOfMemoryError: Java heap space

2016-07-13 Thread Chanh Le
Hi Jean, How do you run your Spark Application? Local Mode, Cluster Mode? If you run in local mode did you use —driver-memory and —executor-memory because in local mode your setting about executor and driver didn’t work that you expected. > On Jul 14, 2016, at 8:43 AM, Jean Georges Perrin w

Re: How to run Zeppelin and Spark Thrift Server Together

2016-07-13 Thread Chanh Le
properties (same way you'd do in hive cli) > c. You can create tables/databases with a LOCATION clause, in case you need > to use non-standard path. > > Best > Ayan > > On Wed, Jul 13, 2016 at 3:20 PM, Chanh Le <mailto:giaosu...@gmail.com>> wrote: >

Re: How to run Zeppelin and Spark Thrift Server Together

2016-07-12 Thread Chanh Le
add it to zeppelin :) > > Best > Ayan > > On Wed, Jul 13, 2016 at 1:53 PM, Chanh Le <mailto:giaosu...@gmail.com>> wrote: > Hi Ayan, > How to set hive metastore in Zeppelin. I tried but not success. > The way I do I add into Spark Interpreter > > >

Re: Spark cache behaviour when the source table is modified

2016-07-12 Thread Chanh Le
Hi Anjali, The Cached is immutable you can’t update data into. They way to update cache is re-create cache. > On Jun 16, 2016, at 4:24 PM, Anjali Chadha wrote: > > Hi all, > > I am having a hard time understanding the caching concepts in Spark. > > I have a hive table("person"), which is cac

Re: Connection via JDBC to Oracle hangs after count call

2016-07-11 Thread Chanh Le
Hi Mich, If I have a stored procedure in Oracle write like this SP get Info: PKG_ETL.GET_OBJECTS_INFO( p_LAST_UPDATED VARCHAR2, p_OBJECT_TYPE VARCHAR2, p_TABLE OUT SYS_REFCURSOR); How to call in Spark because the output is cursor p_TABLE OUT SYS_REFCURSOR. Thanks.

Re: Zeppelin Spark with Dynamic Allocation

2016-07-11 Thread Chanh Le
gt; Data Analyst > Skype: tromika > E-mail: tamas.szur...@odigeo.com <mailto:n...@odigeo.com> > > ODIGEO Hungary Kft. > 1066 Budapest > Weiner Leó u. 16. > www.liligo.com  <http://www.liligo.com/> > check out our newest video  <http://www.youtube.com/user/l

Zeppelin Spark with Dynamic Allocation

2016-07-11 Thread Chanh Le
Hi everybody, I am testing zeppelin with dynamic allocation but seem it’s not working. Logs I received I saw that Spark Context was created successfully and task was running but after that was terminated. Any ideas on that? Thanks. INFO [2016-07-11 15:03:40,096] ({Thread-0} RemoteInter

Re: How to run Zeppelin and Spark Thrift Server Together

2016-07-10 Thread Chanh Le
gt; > On Mon, Jul 11, 2016 at 12:01 PM, ayan guha <mailto:guha.a...@gmail.com>> wrote: > Hi > > Can you try using JDBC interpreter with STS? We are using Zeppelin+STS on > YARN for few months now without much issue. > > On Mon, Jul 11, 2016 at 12:48 PM, Chanh Le

Re: How to run Zeppelin and Spark Thrift Server Together

2016-07-10 Thread Chanh Le
gt; > On Mon, Jul 11, 2016 at 12:48 PM, Chanh Le <mailto:giaosu...@gmail.com>> wrote: > Hi everybody, > We are using Spark to query big data and currently we’re using Zeppelin to > provide a UI for technical users. > Now we also need to provide a UI for business users so

How to run Zeppelin and Spark Thrift Server Together

2016-07-10 Thread Chanh Le
Hi everybody, We are using Spark to query big data and currently we’re using Zeppelin to provide a UI for technical users. Now we also need to provide a UI for business users so we use Oracle BI tools and set up a Spark Thrift Server (STS) for it. When I run both Zeppelin and STS throw error: I

Re: problem making Zeppelin 0.6 work with Spark 1.6.1, throwing jackson.databind.JsonMappingException exception

2016-07-09 Thread Chanh Le
Hi, This weird because I am using Zeppelin from version 0.5.6 and just upgraded to 0.6.0 for couple of days both work fine with Spark 1.6.1. For 0.6.0 I am using zeppelin-0.6.0-bin-netinst. > On Jul 9, 2016, at 9:25 PM, Mich Talebzadeh wrote: > > Hi, > > I just installed the latest Zeppelin

Re: Why so many parquet file part when I store data in Alluxio or File?

2016-07-08 Thread Chanh Le
artition of the files. > > Hope that helps, > Gene > > On Sun, Jul 3, 2016 at 8:02 PM, Chanh Le <mailto:giaosu...@gmail.com>> wrote: > Hi Gene, > Could you give some suggestions on that? > > > >> On Jul 1, 2016, at 5:31 PM, Ted Yu > <mailto:yuz

Re: Any ways to connect BI tool to Spark without Hive

2016-07-07 Thread Chanh Le
. > > One thing to note: many BI tools like Qliksense, Tablaue (not sure of oracle > Bi Tool) queires and the caches data on client side. This works really well > in real life. > > > On Fri, Jul 8, 2016 at 1:58 PM, Chanh Le <mailto:giaosu...@gmail.com>> wrote:

Re: Any ways to connect BI tool to Spark without Hive

2016-07-07 Thread Chanh Le
27;s technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 8 July 2016 at 04:58, Chanh Le <mailto:giaosu...@gmail.com>> wrote: > Hi Mich, > Thanks for replyin

Re: Any ways to connect BI tool to Spark without Hive

2016-07-07 Thread Chanh Le
may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 8 July 2016 at 04:19, Chanh Le <mailto:giaosu...@gmail.com

Any ways to connect BI tool to Spark without Hive

2016-07-07 Thread Chanh Le
Hi everyone, Currently we use Zeppelin to analytics our data and because of using SQL it’s hard to distribute for users use. But users are using some kind of Oracle BI tools to analytic because it support some kinds of drag and drop and we can do some kind of permitted for each user. Our archite

Re: Optimize filter operations with sorted data

2016-07-07 Thread Chanh Le
Hi Tan, It depends on how data organise and what your filter is. For example in my case: I store data by partition by field time and network_id. If I filter by time or network_id or both and with other field Spark only load part of time and network in filter then filter the rest. > On Jul 7, 2

Re: Why so many parquet file part when I store data in Alluxio or File?

2016-07-03 Thread Chanh Le
Hi Gene, Could you give some suggestions on that? > On Jul 1, 2016, at 5:31 PM, Ted Yu wrote: > > The comment from zhangxiongfei was from a year ago. > > Maybe something changed since them ? > > On Fri, Jul 1, 2016 at 12:07 AM, Chanh Le <mailto:giaosu...@gmail.co

Re: Why so many parquet file part when I store data in Alluxio or File?

2016-06-30 Thread Chanh Le
titions to separate part files. > > Thanks > Deepak > > On 1 Jul 2016 8:01 am, "Chanh Le" <mailto:giaosu...@gmail.com>> wrote: > Hi everyone, > I am using Alluxio for storage. But I am little bit confuse why I am do set > block size of alluxio is 512MB an

Re: Looking for help about stackoverflow in spark

2016-06-30 Thread Chanh Le
Hi John, I think it relates to drivers memory more than the others thing you said. Can you just increase more memory for driver? > On Jul 1, 2016, at 9:03 AM, johnzeng wrote: > > I am trying to load a 1 TB collection into spark cluster from mongo. But I am > keep getting stack overflow error

Re: Best practice for handing tables between pipeline components

2016-06-29 Thread Chanh Le
Hi Everett, We are using Alluxio for the last 2 months. We implement Alluxio for sharing data each Spark Job, isolated Spark only for process layer and Alluxio for the storage layer. > On Jun 29, 2016, at 2:52 AM, Everett Anderson > wrote: > > Thanks! Alluxio looks quite promising, but also

Re: Spark 2.0 Preview After caching query didn't work and can't kill job.

2016-06-15 Thread Chanh Le
156) at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2155) at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2449) at org.apache.spark.sql.Dataset.count(Dataset.scala:2155) ... 48 elided I lost all my executors. > On Jun 15, 2016, at 8:44 PM, Chanh Le wrote:

Re: Spark 2.0 Preview After caching query didn't work and can't kill job.

2016-06-15 Thread Chanh Le
n Tue, Jun 14, 2016 at 3:45 AM, Chanh Le <mailto:giaosu...@gmail.com>> wrote: > I am testing Spark 2.0 > I load data from alluxio and cached then I query but the first query is ok > because it kick off cache action. But after that I run the query again and > it’s stuck.

Spark 2.0 Preview After caching query didn't work and can't kill job.

2016-06-14 Thread Chanh Le
I am testing Spark 2.0 I load data from alluxio and cached then I query but the first query is ok because it kick off cache action. But after that I run the query again and it’s stuck. I ran in cluster 5 nodes in spark-shell. Did anyone has this issue?

Re: Spark Partition by Columns doesn't work properly

2016-06-09 Thread Chanh Le
Ok, thanks. On Thu, Jun 9, 2016, 12:51 PM Jasleen Kaur wrote: > The github repo is https://github.com/datastax/spark-cassandra-connector > > The talk video and slides should be uploaded soon on spark summit website > > > On Wednesday, June 8, 2016, Chanh Le wrote: > >

Re: Spark Partition by Columns doesn't work properly

2016-06-08 Thread Chanh Le
s value > > On Wednesday, June 8, 2016, Chanh Le wrote: > >> Hi everyone, >> I tested the partition by columns of data frame but it’s not good I mean >> wrong. >> I am using Spark 1.6.1 load data from Cassandra. >> I repartition by 2 field date, network_id -

Spark Partition by Columns doesn't work properly

2016-06-08 Thread Chanh Le
Hi everyone, I tested the partition by columns of data frame but it’s not good I mean wrong. I am using Spark 1.6.1 load data from Cassandra. I repartition by 2 field date, network_id - 200 partitions I reparation by 1 field date - 200 partitions. but my data is data of 90 days -> I mean if we repa

  1   2   >