Re: Storm hive bolt

2017-03-31 Thread Marcin Kasiński
Hi Eugene.

Below yo have my pom file.

Can you check it and fix it to use repositories in proper way, please ?

I'm working with my problem over 2 weeks and I'm loosing hope.


http://maven.apache.org/POM/4.0.0;
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
4.0.0
StormSample
StormSample
0.0.1-SNAPSHOT

 
UTF-8
1.7
1.7
1.0.1
0.3.0
0.8.2.2.3.0.0-2557
1.7.7
4.11


src




maven-compiler-plugin
3.3

1.8
1.8




org.apache.maven.plugins
maven-jar-plugin



true
lib/
mk.StormSample





 
org.apache.maven.plugins
maven-shade-plugin
1.4

true



package

shade


 

*:*

META-INF/*.SF
META-INF/*.DSA
META-INF/*.RSA

  defaults.yaml



















 
org.apache.storm
storm-hive
1.0.2



jline
jline







org.apache.storm
storm-hbase
1.0.1





   org.apache.storm
storm-core
1.0.3


 log4j-over-slf4j
org.slf4j








org.apache.kafka
kafka_2.10
0.10.0.0
   

org.apache.zookeeper
zookeeper


org.slf4j
slf4j-log4j12


log4j
log4j




org.slf4j
log4j-over-slf4j
1.7.21





org.apache.storm
storm-kafka
1.0.1
 



 





org.apache.storm
storm-jdbc
1.0.3




org.apache.hadoop
hadoop-hdfs
2.6.0


ch.qos.logback
logback-classic


javax.servlet
servlet-api







com.googlecode.json-simple
json-simple
1.1





log4j
log4j
1.2.17











hortonworks
http://repo.hortonworks.com/content/groups/public/org/apache/storm/storm-hive/1.0.1.2.0.1.0-12/





pozdrawiam
Marcin Kasiński
http://itzone.pl


On 30 March 2017 at 17:51, Eugene Koifman  wrote:
> It maybe because you are mixing artifacts from HDP/F and Apache when 
> compiling the topology.
> Can you try using 
> http://repo.hortonworks.com/content/groups/public/org/apache/storm/storm-hive/1.0.1.2.0.1.0-12/
> Rather than
> 
> org.apache.storm
> storm-hive
> 1.0.3
> 
>
> Eugene
>
> On 3/29/17, 9:47 AM, "Marcin Kasiński"  wrote:
>
> I've upgraded my environment.
>
> I have HIve on HDP 2.5 (environment 1) and storm on HDF 2.1
>
> (environment 2)
>
> I have the same eroor:
>
> On storm (HDF 2.1):
>
> Caused by: org.apache.hive.hcatalog.streaming.TransactionError: Unable
> to acquire lock on {metaStoreUri='thrift://hdp1.local:9083',
> database='default', table='stock_prices', partitionVals=[Marcin] } at
> 
> org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransactionImpl(HiveEndPoint.java:575)
> ~[stormjar.jar:?]
>
> On hive metastore (HDP 2.5):
>
> 2017-03-29 11:56:29,926 ERROR [pool-5-thread-17]:
> server.TThreadPoolServer (TThreadPoolServer.java:run(297)) - Error
> occurred during processing of message.
> java.lang.IllegalStateException: Unexpected DataOperationType: UNSET
> agentInfo=Unknown txnid:54 at
> 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:938)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:814)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaSt
> pozdrawiam
> Marcin Kasiński
> http://itzone.pl
>
>
> On 27 March 2017 at 22:01, Marcin Kasiński  
> wrote:
> > Hello.
> >
> > Thank you for reply.
> >
> > I do really want to solve it.
> >
> > I'm sure i compiled sources again with new jars.
> >
> > I've changed source from 

Re: ORC slipt

2017-03-31 Thread Prasanth Jayachandran
Please find answers inline.

Thanks
Prasanth
_
From: Alberto Ramón 
>
Sent: Friday, March 31, 2017 9:32 AM
Subject: ORC slipt
To: >


Some doubts about ORC:


1- hive.exec.orc.default.buffer.size is used for read or write?
Configurable only during write. Writer writes this buffer size into the footer 
which readers use during decompression.

2- orc.stripe.size is compressed or uncompresed?

Both. Stripe size is essentially sum of all buffers of all columns (also 
dictionary size) held in memory.

3- orc.stripe.size must be multiple of HDFS block size?

It is optimal to have it as multiple of hdfs block size else writer will adjust 
the last stripe size within a block so as to not straddle hdfs block boundary 
or pad the remaining space if it is less than 5% of block size. Note that hdfs 
block size can be configurable via orc.block.size and is independent of cluster 
wide block size. Default stripe size is 64 mb and block size is 256mb.


4- For read ORC file , the numbers of mappers depends onr HDFS blocks or Stripe 
number?

Depends. If predicate pushdown is enabled each split could have one or more 
stripes. If predicate pushdown is disabled adjacent stripes are grouped 
together until hdfs block size to form a single split.

Let's say, we have 3 stripes and if 2nd stripe does not satisfy the predicate 
then 1st and 3rd stripe will become 2 separate splits and 2nd stripe will be 
ignored. If predicate pushdown is disabled, all 3 stripes will together form a 
single split as it is less than block boundary.

Number of splits will vary based on input format and execution engine.

5- hive.exec.orc.split.strategy is used for read?

Yes.






Re: Storm hive bolt

2017-03-31 Thread Marcin Kasiński
You can dowload my project from  link http://itzone.pl/tmp234der/StormSample.zip

This is simple topology  with Kafka spout (it works ), hbase bolt (it
works), hive bolt (doesnt work)

I've created hive table:

CREATE TABLE stock_prices(
  day DATE,
  open FLOAT,
  high FLOAT,
  low FLOAT,
  close FLOAT,
  volume INT,
  adj_close FLOAT
)
PARTITIONED BY (name STRING)
CLUSTERED BY (day) into 5 buckets
STORED AS ORC
TBLPROPERTIES ('transactional'='true');


I've created hbase table


create 'stock_prices', 'cf'

I've created kafka tpoic:

/usr/hdf/current/kafka-broker/bin/kafka-topics.sh --create --zookeeper
hdf1.local:2181,hdf2.local:2181,hdf3.local:2181 --replication-factor 3
--partition 3 --topic my-topic


I've deployed app to storm.

storm jar /root/StormSample-0.0.1-SNAPSHOT.jar
mk.stormkafka.KafkaSpoutTestTopology  MKjobarg1XXX


When i Deploy and publish to kafka topic sample message I can not save
data to hive table.

I'm 100% sure I have hive configured OK, because I can add something
to this table manually outside storm.


2017-03-30,11,12,13,14,15,16,Marcin2345



Caused by: org.apache.hive.hcatalog.streaming.TransactionError: Unable
to acquire lock on {metaStoreUri='thrift://hdp1.local:9083',
database='default', table='stock_prices', partitionVals=[Marcin] }
at 
org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransactionImpl(HiveEndPoint.java:575)
~[stormjar.jar:?]
at 
org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransaction(HiveEndPoint.java:544)
~[stormjar.jar:?]
at org.apache.storm.hive.common.HiveWriter.nextTxnBatch(HiveWriter.java:259)
~[stormjar.jar:?]
at org.apache.storm.hive.common.HiveWriter.(HiveWriter.java:72)
~[stormjar.jar:?]
... 13 more


I think it could be pom dependencies problem.

I have no idea how I can fix if.

Can you help me ?
pozdrawiam
Marcin Kasiński
http://itzone.pl


On 31 March 2017 at 12:35, Igor Kuzmenko  wrote:
> Check this example:
> https://github.com/hortonworks/storm-release/blob/HDP-2.5.0.0-tag/external/storm-hive/src/test/java/org/apache/storm/hive/bolt/HiveTopology.java
>
> If you can, please post your topology code. It's strange that you are using
> org.apache.hadoop.hive package directly.
>
>
> On Fri, Mar 31, 2017 at 1:08 PM, Marcin Kasiński 
> wrote:
>>
>> After changin I have lots of errors in eclipse
>> "DescriptionResourcePathLocationType
>> The import org.apache.hadoop.hive cannot be resolved
>> TestHiveBolt.java/StormSample/src/mk/storm/hiveline 26Java
>> Problem
>> "
>>
>> Do you have hello world storm hive project (HDP 1.5 and HDF 2.1) ?
>>
>> Can you send it to me an I will try it ?
>>
>>
>> pozdrawiam
>> Marcin Kasiński
>> http://itzone.pl
>>
>>
>> On 31 March 2017 at 11:30, Igor Kuzmenko  wrote:
>> > I'm using hive streaming bolt with HDP 2.5.0.0.
>> > Try this:
>> >
>> > 
>> > 
>> > hortonworks
>> >
>> >
>> > http://nexus-private.hortonworks.com/nexus/content/groups/public/
>> > 
>> > 
>> >
>> > 
>> > org.apache.storm
>> > storm-hive
>> > 1.0.1.2.5.0.0-1245
>> > 
>> >
>> >
>> > On Fri, Mar 31, 2017 at 11:32 AM, Marcin Kasiński
>> >  wrote:
>> >>
>> >> Hi Eugene.
>> >>
>> >> Below yo have my pom file.
>> >>
>> >> Can you check it and fix it to use repositories in proper way, please ?
>> >>
>> >> I'm working with my problem over 2 weeks and I'm loosing hope.
>> >>
>> >>
>> >> http://maven.apache.org/POM/4.0.0;
>> >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
>> >> xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
>> >> http://maven.apache.org/xsd/maven-4.0.0.xsd;>
>> >> 4.0.0
>> >> StormSample
>> >> StormSample
>> >> 0.0.1-SNAPSHOT
>> >>
>> >>  
>> >>
>> >> UTF-8
>> >> 1.7
>> >> 1.7
>> >> 1.0.1
>> >> 0.3.0
>> >> 0.8.2.2.3.0.0-2557
>> >> 1.7.7
>> >> 4.11
>> >> 
>> >> 
>> >> src
>> >> 
>> >>
>> >> 
>> >> 
>> >> maven-compiler-plugin
>> >> 3.3
>> >> 
>> >> 1.8
>> >> 1.8
>> >> 
>> >> 
>> >>
>> >> 
>> >> org.apache.maven.plugins
>> >> maven-jar-plugin
>> >> 
>> >> 
>> >> 
>> >> true
>> >> lib/
>> >> mk.StormSample
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >>  
>> >> org.apache.maven.plugins
>> >> maven-shade-plugin
>> >> 1.4
>> >> 
>> >> true
>> >> 
>> >> 
>> >> 
>> >> package
>> >> 
>> >> shade
>> >> 
>> >> 
>> >>  

ORC slipt

2017-03-31 Thread Alberto Ramón
Some doubts about ORC:


*1- hive.exec.orc.default.buffer.size* is used for read or write?

*2- orc.stripe.size* is compressed or uncompresed?

*3- orc.stripe.size* must be multiple of HDFS block size?

4- For read ORC file , the numbers of mappers depends onr HDFS blocks or
Stripe number?

*5- hive.exec.orc.split.strategy *is used for read?