Re: 回复: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Ted Yu
Under sql/hive/src/main/scala/org/apache/spark/sql/hive/execution , I only see HiveTableScan and HiveNativeCommand At the beginning of HiveTableScan : * The Hive table scan operator. Column and partition pruning are both handled. Looks like filter pushdown hasn't been implemented. As far as I

Re: build error: code too big: specialStateTransition(int, IntStream)

2016-01-28 Thread Herman van Hövell tot Westerflier
Hi, I have only encountered 'code too large' errors when changing grammars. I am using SBT/Idea, no Eclipse. The size of an ANTLR Parser/Lexer is dependent on the rules inside the source grammar and the rules it depends on. So we should take a look at the IdentifiersParser.g/ExpressionParser.g;

Re: build error: code too big: specialStateTransition(int, IntStream)

2016-01-28 Thread Iulian Dragoș
Thanks for the pointer. It seems to be really a pathological case, since the file that's in error is part of the splinter file (the smaller one, IndetifiersParser). I'll see if I can work around by splitting it some more. iulian On Thu, Jan 28, 2016 at 4:43 PM, Ted Yu

回复: 回复: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread 开心延年
Thanks Ted ,I will try on this version. -- 原始邮件 -- 发件人: "Ted Yu";; 发送时间: 2016年1月28日(星期四) 晚上11:35 收件人: "开心延年"; 抄送: "Jörn Franke"; "Julio Antonio Soto de Vicente"; "Maciej

Re: build error: code too big: specialStateTransition(int, IntStream)

2016-01-28 Thread Ted Yu
After this change: [SPARK-12681] [SQL] split IdentifiersParser.g into two files the biggest file under sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser is SparkSqlParser.g Maybe split SparkSqlParser.g up as well ? On Thu, Jan 28, 2016 at 5:21 AM, Iulian Dragoș

Re: Data not getting printed in Spark Streaming with print().

2016-01-28 Thread Shixiong(Ryan) Zhu
fileStream has a parameter "newFilesOnly". By default, it's true and means processing only new files and ignore existing files in the directory. So you need to ***move*** the files into the directory, otherwise it will ignore existing files. You can also set "newFilesOnly" to false. Then in the

Data not getting printed in Spark Streaming with print().

2016-01-28 Thread satyajit vegesna
HI All, I am trying to run HdfsWordCount example from github. https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala i am using ubuntu to run the program, but dont see any data getting printed after ,

Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Maciej Bryński
Hi, I'm trying to run SQL query on Hive table which is stored on HBase. I'm using: - Spark 1.6.0 - HDP 2.2 - Hive 0.14.0 - HBase 0.98.4 I managed to configure working classpath, but I have following problems: 1) I have UDF defined in Hive Metastore (FUNCS table). Spark cannot use it.. File

Re: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Ted Yu
For the last two problems, hbase-site.xml seems not to be on classpath. Once hbase-site.xml is put on classpath, you should be able to make progress. Cheers > On Jan 28, 2016, at 1:14 AM, Maciej Bryński wrote: > > Hi, > I'm trying to run SQL query on Hive table which is

FPGrowth: adding a stopping criterion (max. literal length or itemset count)

2016-01-28 Thread Tomas Kliegr
Hi all, Could anyone provide pointers on how to extend the SPARK FPGrowth implementation with either of the following stopping criteria: * maximum number of generated itemsets, * maximum length of generated itemsets (i.e. number of items in itemset). The second criterion is e.g. available in

Re: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Julio Antonio Soto de Vicente
Hi, Indeed, Hive is not able to perform predicate pushdown through a HBase table. Nor Hive or Impala can. Broadly speaking, if you need to query your HBase table through a field other than de rowkey: A) Try to "encode" as much info as possible in the rowkey field and use it as your

??????Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?

2016-01-28 Thread ????????
we always used Sql like below. select count(*) from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='' or ydb_province='LIAONING' or ydb_day>='20151217') limit 10 Spark don't push down predicates for TableScanDesc.FILTER_EXPR_CONF_STR, which means that every query is full scan can`t

Re: 回复: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Jörn Franke
Probably a newer Hive version makes a lot of sense here - at least 1.2.1. What storage format are you using? I think the old Hive version had a bug where it always scanned all partitions unless you limit it in the on clause of the query to a certain partition (eg on date=20201119) > On 28 Jan

build error: code too big: specialStateTransition(int, IntStream)

2016-01-28 Thread Iulian Dragoș
Hi, Has anyone seen this error? The code of method specialStateTransition(int, IntStream) is exceeding the 65535 bytes limitSparkSqlParser_IdentifiersParser.java:39907 The error is in ANTLR generated files and it’s (according to Stack Overflow) due to state explosion in parser (or lexer).

??????Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?

2016-01-28 Thread ????????
If we support TableScanDesc.FILTER_EXPR_CONF_STR like hive we may write sql LIKE this select ydb_sex from ydb_example_shu where ydbpartion='20151110' limit 10 select ydb_sex from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='??' or ydb_province='' or ydb_day>='20151217') limit

?????? Spark 1.6.0 + Hive + HBase

2016-01-28 Thread ????????
Is there any body can solve Problem 4)? thanks. Problem 4) Spark don't push down predicates for HiveTableScan, which means that every query is full scan. -- -- ??: "Julio Antonio Soto de Vicente";; : 2016??1??28??(??)

回复: 回复: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread 开心延年
This not hive`s bug .I test hive on my storage is ok. but when i test it on spark-sql is not pass TableScanDesc.FILTER_EXPR_CONF_STR params; so that is the reason cause the full scan. the source code in HiveHBaseTableInputFormat is as follows,that is the reason caused full scan. private

Re: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Maciej Bryński
Ted, You're right. hbase-site.xml resolved problems 2 and 3, but... Problem 4) Spark don't push down predicates for HiveTableScan, which means that every query is full scan. == Physical Plan == TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[count#144L]) +-

Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?

2016-01-28 Thread ????????
Dear spark I am test StorageHandler on Spark-SQL. but i find the TableScanDesc.FILTER_EXPR_CONF_STR is miss ,but i need it ,is three any where i could found it? I really want to get some filter information from Spark Sql, so that I could make a pre filter by my Index ; so where is the

Persisting of DataFrames in transformation workflows

2016-01-28 Thread Gireesh Puthumana
Hi All, I am trying to run a series of transformation over 3 DataFrames. After each transformation, I want to persist DF and save it to text file. The steps I am doing is as follows. *Step0:* Create DF1 Create DF2 Create DF3 Create DF4 (no persist no save yet) *Step1:* Create RESULT-DF1 by

Heuristics for Partitioning Non-Local Data

2016-01-28 Thread Hamel Kothari
Hey spark-devs, I'm in the process of writing a DataSource for what is essentially a java web service. Each relation which we create will consist of a series of queries to this webservice which returns a pretty much known amount of data (eg. 2000 rows, 5 string columns or similar which we can