Re: get float column in cassandra mapreduce

Anseh Danesh Sat, 05 Oct 2013 09:30:15 -0700

Hi.. thanks for your reply shahab..
I don't know what is source file encoding, I export the table in mysql as
csv file and import it in cassandra. in both mysql and cassandra the
temprature column type is float. I don't know how should set them to UTF8..
for your 2nd topic... no I'm not sure it is work.. actually, I don't know
How to select tow different columns in cassandra mapreduce and assign their
values to the variables.. how could I do that?
and the third is yes my column name is temprature.
please help me what to do?



On Sat, Oct 5, 2013 at 7:13 PM, Shahab Yunus <[email protected]> wrote:

> Couple of things which I could I think of. Other might have better ideas.
>
> 1- The exception is about encoding mismatch. Do you know what is your
> source files's encoding and what is your system's default? E.g. it can be
> ISO8859-1 in Windows, UTF-8 in Linux etc.and your file has something else.
> You can explicitly use UTF-8 everywhere if you want. There is wealth of
> information available on the net if you google it.
>
> 2- This is more of an aside, you are parsing your data to float and String
> without checking what column it is.Baically you do the following two
> conversion in all cases, no matter what the column, so what will happen if
> the column is data and toDouble statement is called?:
>  String value1 = ByteBufferUtil.string(column.getValue());
> double value2 = ByteBufferUtil.toDouble(column.getValue());
>
> Did it ever work?
>
>
> 3- Is the column name in your source data files 'temperature' or
> 'temprature'? You are using the latter in your code and if it is not what
> is in the data then you might be trying to parse empty or malformed string.
>
> Regards,
> Shahab
>
>
> On Sat, Oct 5, 2013 at 5:16 AM, Anseh Danesh <[email protected]>wrote:
>
>> Hi all... I have a question. in the cassandra wordcount mapreduce with
>> cql3, I want to get a string column and a float (or double) column as map
>> input key and value. I mean I want to get date column of type string as key
>> and temprature column of type float as value. but when I println value of
>> temprature it shows me som of them and then error....
>>
>>
>> here is the code:
>> package org.apache.cassandra.com;
>>
>> import java.io.IOException;
>> import java.nio.ByteBuffer;
>> import java.util.*;
>> import java.util.Map.Entry;
>>
>> import org.apache.cassandra.hadoop.cql3.CqlConfigHelper;
>> import org.apache.cassandra.hadoop.cql3.CqlOutputFormat;
>> import org.slf4j.Logger;
>> import org.slf4j.LoggerFactory;
>>
>> import org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat;
>> import org.apache.cassandra.hadoop.ConfigHelper;
>> import org.apache.cassandra.utils.ByteBufferUtil;
>> import org.apache.hadoop.conf.Configuration;
>> import org.apache.hadoop.conf.Configured;
>> import org.apache.hadoop.fs.Path;
>> import org.apache.hadoop.io.IntWritable;
>> import org.apache.hadoop.io.Text;
>> import org.apache.hadoop.mapreduce.Job;
>> import org.apache.hadoop.mapreduce.Mapper;
>> import org.apache.hadoop.mapreduce.Reducer;
>> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
>> import org.apache.hadoop.util.Tool;
>> import org.apache.hadoop.util.ToolRunner;
>>
>> import java.nio.charset.CharacterCodingException;
>>
>>
>> public class dewpoint extends Configured implements Tool
>> {
>>     private static final Logger logger =
>> LoggerFactory.getLogger(dewpoint.class);
>>
>>     static final String KEYSPACE = "weather";
>>     static final String COLUMN_FAMILY = "momentinfo";
>>
>>     static final String OUTPUT_REDUCER_VAR = "output_reducer";
>>     static final String OUTPUT_COLUMN_FAMILY = "output_words";
>>
>>     private static final String OUTPUT_PATH_PREFIX = "/tmp/dewpointt";
>>
>>     private static final String PRIMARY_KEY = "row_key";
>>
>>     public static void main(String[] args) throws Exception
>>     {
>>         // Let ToolRunner handle generic command-line options
>>         ToolRunner.run(new Configuration(), new dewpoint(), args);
>>         System.exit(0);
>>     }
>>
>>     public static class TokenizerMapper extends Mapper<Map<String,
>> ByteBuffer>, Map<String, ByteBuffer>, Text, IntWritable>
>>     {
>>         private final static IntWritable one = new IntWritable(1);
>>         private Text date = new Text();
>>
>>
>>         public void map(Map<String, ByteBuffer> keys, Map<String,
>> ByteBuffer> columns, Context context) throws IOException,
>> InterruptedException
>>         {
>>             for (Entry<String, ByteBuffer> column : columns.entrySet())
>>             {
>>                 if (!"date".equalsIgnoreCase(column.getKey()) &&
>> !"temprature".equalsIgnoreCase(column.getKey()))
>>                     continue;
>>
>>                 String value1 = ByteBufferUtil.string(column.getValue());
>>                 double value2 =
>> ByteBufferUtil.toDouble(column.getValue());
>>                 System.out.println(value2);
>> .....
>>
>>
>> and here is the error:
>>
>> 13/10/05 12:36:22 INFO com.dewpoint: output reducer type: filesystem
>> 13/10/05 12:36:24 INFO util.NativeCodeLoader: Loaded the native-hadoop
>> library
>> 13/10/05 12:36:24 WARN mapred.JobClient: No job jar file set.  User
>> classes may not be found. See JobConf(Class) or JobConf#setJar(String).
>> 13/10/05 12:36:26 INFO mapred.JobClient: Running job:
>> job_local1875596001_0001
>> 13/10/05 12:36:27 INFO mapred.LocalJobRunner: Waiting for map tasks
>> 13/10/05 12:36:27 INFO mapred.LocalJobRunner: Starting task:
>> attempt_local1875596001_0001_m_000000_0
>> 13/10/05 12:36:27 INFO util.ProcessTree: setsid exited with exit code 0
>> 13/10/05 12:36:27 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1e2670b
>> 13/10/05 12:36:27 INFO mapred.MapTask: Processing split:
>> ColumnFamilySplit((5366152502320075885, '9070993788622720120] @[localhost])
>> 13/10/05 12:36:27 INFO mapred.MapTask: io.sort.mb = 100
>> 13/10/05 12:36:27 INFO mapred.JobClient:  map 0% reduce 0%
>> 13/10/05 12:36:28 INFO mapred.MapTask: data buffer = 79691776/99614720
>> 13/10/05 12:36:28 INFO mapred.MapTask: record buffer = 262144/327680
>> 6.00457842484433E-67
>> 13/10/05 12:36:30 INFO mapred.MapTask: Starting flush of map output
>> 13/10/05 12:36:30 INFO mapred.MapTask: Finished spill 0
>> 13/10/05 12:36:30 INFO mapred.LocalJobRunner: Starting task:
>> attempt_local1875596001_0001_m_000001_0
>> 13/10/05 12:36:30 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1579a30
>> 13/10/05 12:36:30 INFO mapred.MapTask: Processing split:
>> ColumnFamilySplit((-5699318449577318512, '-2034684803435882987]
>> @[localhost])
>> 13/10/05 12:36:30 INFO mapred.MapTask: io.sort.mb = 100
>> 13/10/05 12:36:32 INFO mapred.MapTask: data buffer = 79691776/99614720
>> 13/10/05 12:36:32 INFO mapred.MapTask: record buffer = 262144/327680
>> 6.004578424845004E-67
>> 13/10/05 12:36:32 INFO mapred.MapTask: Starting flush of map output
>> 13/10/05 12:36:32 INFO mapred.MapTask: Finished spill 0
>> 13/10/05 12:36:32 INFO mapred.LocalJobRunner: Starting task:
>> attempt_local1875596001_0001_m_000002_0
>> 13/10/05 12:36:32 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@112da40
>> 13/10/05 12:36:32 INFO mapred.MapTask: Processing split:
>> ColumnFamilySplit((1684704676388456087, '5366152502320075885] @[localhost])
>> 13/10/05 12:36:32 INFO mapred.MapTask: io.sort.mb = 100
>> 13/10/05 12:36:32 INFO mapred.MapTask: data buffer = 79691776/99614720
>> 13/10/05 12:36:32 INFO mapred.MapTask: record buffer = 262144/327680
>> 1.4273722733722645E-71
>> 13/10/05 12:36:32 INFO mapred.MapTask: Starting flush of map output
>> 13/10/05 12:36:32 INFO mapred.MapTask: Finished spill 0
>> 13/10/05 12:36:32 INFO mapred.LocalJobRunner: Starting task:
>> attempt_local1875596001_0001_m_000003_0
>> 13/10/05 12:36:32 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@126a29c
>> 13/10/05 12:36:32 INFO mapred.MapTask: Processing split:
>> ColumnFamilySplit((-9223372036854775808, '-5699318449577318512]
>> @[localhost])
>> 13/10/05 12:36:32 INFO mapred.MapTask: io.sort.mb = 100
>> 13/10/05 12:36:33 INFO mapred.MapTask: data buffer = 79691776/99614720
>> 13/10/05 12:36:33 INFO mapred.LocalJobRunner:
>> 13/10/05 12:36:33 INFO mapred.MapTask: record buffer = 262144/327680
>> 6.00457842484433E-67
>> 13/10/05 12:36:33 INFO mapred.MapTask: Starting flush of map output
>> 13/10/05 12:36:33 INFO mapred.MapTask: Finished spill 0
>> 13/10/05 12:36:33 INFO mapred.LocalJobRunner: Starting task:
>> attempt_local1875596001_0001_m_000004_0
>> 13/10/05 12:36:33 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2f2295
>> 13/10/05 12:36:33 INFO mapred.MapTask: Processing split:
>> ColumnFamilySplit((-2034684803435882987, '1684704676388456087] @[localhost])
>> 13/10/05 12:36:33 INFO mapred.MapTask: io.sort.mb = 100
>> 13/10/05 12:36:34 INFO mapred.MapTask: data buffer = 79691776/99614720
>> 13/10/05 12:36:34 INFO mapred.MapTask: record buffer = 262144/327680
>> 13/10/05 12:36:34 INFO mapred.JobClient:  map 16% reduce 0%
>> 6.004595404242602E-67
>> 13/10/05 12:36:34 INFO mapred.MapTask: Starting flush of map output
>> 13/10/05 12:36:34 INFO mapred.MapTask: Finished spill 0
>> 13/10/05 12:36:34 INFO mapred.LocalJobRunner: Starting task:
>> attempt_local1875596001_0001_m_000005_0
>> 13/10/05 12:36:34 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1706da8
>> 13/10/05 12:36:34 INFO mapred.MapTask: Processing split:
>> ColumnFamilySplit((9070993788622720120, '-9223372036854775808] @[localhost])
>> 13/10/05 12:36:34 INFO mapred.MapTask: io.sort.mb = 100
>> 13/10/05 12:36:34 INFO mapred.MapTask: data buffer = 79691776/99614720
>> 13/10/05 12:36:34 INFO mapred.MapTask: record buffer = 262144/327680
>> 6.004601064041352E-67
>> 13/10/05 12:36:34 INFO mapred.MapTask: Starting flush of map output
>> 13/10/05 12:36:34 INFO mapred.MapTask: Finished spill 0
>> 13/10/05 12:36:34 INFO mapred.LocalJobRunner: Map task executor complete.
>> 13/10/05 12:36:34 WARN mapred.LocalJobRunner: job_local1875596001_0001
>> java.lang.Exception: java.nio.charset.MalformedInputException: Input
>> length = 1
>>     at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>> Caused by: java.nio.charset.MalformedInputException: Input length = 1
>>     at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
>>     at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:781)
>>     at
>> org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:167)
>>     at
>> org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:124)
>>     at
>> org.apache.cassandra.com.dewpoint$TokenizerMapper.map(dewpoint.java:65)
>>     at
>> org.apache.cassandra.com.dewpoint$TokenizerMapper.map(dewpoint.java:1)
>>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
>>     at
>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>>     at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>     at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>     at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>     at java.lang.Thread.run(Thread.java:662)
>> 13/10/05 12:36:35 INFO mapred.JobClient: Job complete:
>> job_local1875596001_0001
>> 13/10/05 12:36:35 INFO mapred.JobClient: Counters: 15
>> 13/10/05 12:36:35 INFO mapred.JobClient:   FileSystemCounters
>> 13/10/05 12:36:35 INFO mapred.JobClient:     FILE_BYTES_READ=2713
>> 13/10/05 12:36:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=53478
>> 13/10/05 12:36:35 INFO mapred.JobClient:   File Input Format Counters
>> 13/10/05 12:36:35 INFO mapred.JobClient:     Bytes Read=0
>> 13/10/05 12:36:35 INFO mapred.JobClient:   Map-Reduce Framework
>> 13/10/05 12:36:35 INFO mapred.JobClient:     Map output materialized
>> bytes=23
>> 13/10/05 12:36:35 INFO mapred.JobClient:     Combine output records=1
>> 13/10/05 12:36:35 INFO mapred.JobClient:     Map input records=1
>> 13/10/05 12:36:35 INFO mapred.JobClient:     Physical memory (bytes)
>> snapshot=0
>> 13/10/05 12:36:35 INFO mapred.JobClient:     Spilled Records=1
>> 13/10/05 12:36:35 INFO mapred.JobClient:     Map output bytes=15
>> 13/10/05 12:36:35 INFO mapred.JobClient:     CPU time spent (ms)=0
>> 13/10/05 12:36:35 INFO mapred.JobClient:     Total committed heap usage
>> (bytes)=363921408
>> 13/10/05 12:36:35 INFO mapred.JobClient:     Virtual memory (bytes)
>> snapshot=0
>> 13/10/05 12:36:35 INFO mapred.JobClient:     Combine input records=1
>> 13/10/05 12:36:35 INFO mapred.JobClient:     Map output records=1
>> 13/10/05 12:36:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=103
>>
>> what does it mean?
>>
>
>

Re: get float column in cassandra mapreduce

Reply via email to